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What is here called controlled natural language (CNL) has traditionally been given many 
different names. Especially during the last four decades, a wide variety of such languages 
have been designed. They are applied to improve communication among humans, to improve 
translation, or to provide natural and intuitive representations for formal notations. Despite 
the apparent differences, it seems sensible to put all these languages under the same umbrella. 
To bring order to the variety of languages, a general classification scheme is presented here. 
A comprehensive survey of existing English-based CNLs is given, listing and describing 100 
languages from 1930 until today. Classification of these languages reveals that they form a 
single scattered cloud filling the conceptual space between natural languages such as English 
on the one end and formal languages such as propositional logic on the other. The goal of this 
article is to provide a common terminology and a common model for CNL, to contribute to the 
understanding of their general nature, to provide a starting point for researchers interested in 
the area, and to help developers to make design decisions. 

1. Introduction 

Controlled, processable, simplified, technical, structured, and basic are just a few examples 
of atfribufes given fo consfrucfed languages of fhe type to be discussed here. We will 
call them controlled natural languages (CNL) or simply controlled languages. Basic 
English, Caterpillar Eundamental English, SBVR Structured English, and Attempto 
Controlled English are some examples; many more will be presented below. This article 
investigates the nature of such languages, provides a general classification scheme, and 
explores existing approaches. 

As fhe variefy of affribufes suggesfs, fhere is no general agreemenf on fhe charac- 
ferisfic properties of CNL, making if a very fuzzy ferm. There are two main reasons 
for fhis. Eirsf, CNL approaches emerged in differenf environmenfs (indusfry, academia, 
and governmenf), in different disciplines (computer science, philosophy, linguistics, 
and engineering), and over many decades (from the 1930s until today). People from 
different backgrounds often used and continue to use different names for fhe same kind 
of language. Second, although controlled natural languages seem to share important 
properties, they also exhibit a very wide variety: Some are inherently ambiguous, others 
are as precise as formal logic; virfually everyfhing can be expressed in some, only 
very liffle in ofhers; some look perfecfly nafural, ofhers look more like programming 
languages; some are defined by jusf a handful of grammar rules, ofhers are so complex 
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that no complete grammar exists. This variety makes it difficult to get a clear picture 
of fhe fundamenfal properfies. This arficle aims af resolving fhis problem by giving an 
overview of existing CNLs and by providing a general classification scheme. Generally, 
fhis work has several, parfly overlapping goals, ranging from purely fheorefical fo more 
practical objectives (lisfed in fhis order): 

• To give us a better undersfanding of fhe nafure of CNL 

• To esfablish a common ferminology and a common model for CNL 

• To provide a sfarfing poinf for researchers inferesfed in CNL 

• To help CNL developers make design decisions 

Alfhough a wide variefy of CNLs have been applied fo a wide variefy of prob¬ 
lem domains, virfually all of fhem seem fo be relevanf fo fhe field of compufafional 
linguisfics. Among ofher fechniques, fhey involve lexical analyses, grammar and sfyle 
checking, ambiguify defection, machine franslafion, and compufafional semantics. 

Unsurprisingly, mosf CNLs are based on English. For fhe sake of simplicify, fhe 
survey presenfed in fhis arficle is resfricfed fo fhese languages and excludes existing 
approaches based on ofher nafural languages, such as German and Chinese. The classi- 
ficafion scheme fo be presenfed, however, is general and nof resfricfed fo English in any 
way. 

In whaf follows, fhe relevanf background is discussed (Secfion |2|, a classification 
scheme is infroduced (Secfion |3, existing English-based CNLs are classified and de¬ 
scribed based on a small sample (Secfion |4), fhe resulfs are analyzed (Secfion |5), and 
finally fhe conclusions are discussed (Secfion 0. The appendix shows fhe full lisf of 
languages wifh shorf descriptions for each of fhem. 

2. Background 

Confrolled nafural language being such a fuzzy ferm, if is imporfanf fo clarify ifs 
meaning, fo esfablish a common definition, and fo undersfand fhe differences fo relafed 
ferms. In addition, if is helpful fo review previous affempfs fo classify and characferize 
CNLs. 

2.1 Definition 

As mentioned above, there is no generally agreed-upon definition for confrolled nafural 
language and for closely relafed ferms including confrolled language, consfrained nafu¬ 
ral language, simplified language, and confrolled English. The following fwo quofafions 
illusfrafe fhis: 

A controlled language (CL) is a restricted version of a natural language which has been 
engineered to meet a special purpose, most often that of writing technical 
documentation for non-native speakers of the document language. A typical CL uses a 
well-defined subset of a langu age's grammar a nd lexicon, but adds the terminology 
needed in a technical domain. jKittredge 2003 1 

Controlled natural language is a subset of natural language that can be accurately and 
efficiently proce ssed by a computer, but is expressive enough to allow natural usage by 
non-specialists. IFuchs and Schwitter 1995t 
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Both descriptions exhibit a strong bias towards one particular t 5 rpe of CNL (these t 5 rpes 
are discussed in more detail below): The first quotation focuses on technical languages 
that are designed to improve comprehensibility, whereas the second one only covers 
languages that can be interpreted by computers. They agree, however, on the fact that 
a CNL is based on a certain natural language but is more restrictive. It is also generally 
agreed that CNLs are constructed languages, which means languages that did not 
emerge naturally but have been engineered. The use of the term subset is misleading 
though, since many CNLs are not proper subsets of the underlying natural language. 
Many of these languages have small deviations from natural grammar or semantics. 
Others make use of unnatural elements such as colors and parentheses to increase 
readability and precision. Some even consider the programming language COBOL 
a controlled natural language dSowa 2000all . The subset relation in its mathematical 
sense is clearly too strict to cover a large part of the languages commonly called CNL. 
Although they all clearly share important properties, the specific languages can be 
quite different in their coverage and nature. It is not surprising that O'Brian (I2003L 
who compared English-based CNLs of different types, comes to the conclusion that 
no common core language can be identified. To meet these problems, the following 
definition is proposed here: 

Definition 1 (long) 

A language is called a controlled natural language if and only if it has all of the 
following four properties: 

1. It is based on exactly one natural language (its "base language"). 

2. The most important difference between it and its base language (but not 
necessarily the only one) is that it is more restrictive concerning lexicon, 

S 5 mtax, and/or semantics. 

3. It preserves most of the natural properties of its base language, so that 
speakers of the base language can intuitively and correctly understand 
texts in the controlled natural language, at least to a substantial degree. 

4. It is a constructed language, which means that it is explicitly and 
consciously defined, and is not the product of an implicit and natural 
process (even though it is based on a natural language that is the product 
of an implicit and natural process). 

Properties 2 and 3 are deliberately vague, because it is not possible or desirable to draw 
a strict line there. Properties 1 and 3 refer to the N in CNL: naturalness; Properties 2 
and 4 refer to the C: control. We will later be able to be a little more precise concerning 
property number 3. We leave it for now, and we can summarize this relatively verbose 
definition in the form of the following short version: 

Definition 2 (short) 

A controlled natural language is a constructed language that is based on a certain 
natural language, being more restrictive concerning lexicon, S 5 mtax, and/or semantics 
while preserving most of its natural properties. 

As a further remark, we should note that the term language is used in a sense that 
is restricted to sequential languages and excludes visual languages such as diagrams 


3 






Computational Linguistics 


and the like. We can verify that the definitions above include virtually all languages 
that have been called CNL, while it excludes natural languages (since they are not 
constructed), languages such as Esperanto (since they are not based on one particular 
natural language), and common formal languages (since they lack intuitive understand- 
ability). 

2.2 Related Terms 


Before we move on to examine the t 5 rpes and properties of languages, we should discuss 
a number of terms that are related to CNL and are easy to confuse: sublanguage, 
fragments of language, style guide, phraseology, controlled vocabulary, and constructed 
language. 

Sublanguages are languages that naturally arise when "a community of speakers 
(i.e. 'experts') shares some specialized knowledge about a restricted semantic domain 
[and] the experts communicate about the restricted domain in a recurrent situation, or 
set of highly similar situations" ( Kittredge 2003| . As with controlled natural language, a 
sublanguage is based on exactly one natural language and is more restricted. The crucial 
difference between the two terms is that sublanguages emerge naturally, whereas CNLs 
are explicitly and consciously defined. 

Fragments of language is a term denoting "a collection of sentences forming a nat¬ 
urally delineated subset of [a natural] language" iPratt-Hartmann and Third 2006) . The 
term is closely related to CNL and the difference seems to be mainly a methodological 
one: Fragments of language are identified rather than defined, they are closely kept in the 
context of the full natural language and related fragments, and their purpose is rather 
to theoretically study them than to directly use them to solve a particular problem. A 
CNL can be seen as a fragment of a language "developed for the purpose of supporting 
some technical activity" (IPratt-Hartmarm 20091 . 

Style guides are documents containing instructions on how to write in a 
certain natural language. Some style guides such as "How to write clearly" 
([European Commission 2011} provide "hints, not rules" and therefore do not describe a 
new language, but only give advice on how to use the given natural language. However, 
other style guides such as the Plain Language Guidelines dPLAIN 201111 are stricter and 
do describe a language that is not identical to the respective full language. The question 
of whether such a language can be considered a CNL depends on whether the style 
guide defines a new language or whether it merely describes good practices that have 
emerged naturally. 

Phraseology is a term that denotes a "set of expressions used by a particular 
person or group" ( Houghton Mifflin Harcourt 2000) . Typically, this term is used when 
the grammatical structure is simpler than in full natural language. In contrast to sub¬ 
languages and fragments of languages, a phraseology is not a selection of sentences but 
a selection of phrases. Phraseologies can be natural or constructed, and in the latter case 
they are usually considered CNLs. 

Controlled vocabularies are standardized collections of names and expressions, 
including "lists of controlled terms, S 5 monym rings, taxonomies, and thesauri" 
dANSI/NISO 20(j5| |. Mostly, controlled vocabularies target a specific, narrow domain. In 
contrast to CNL, they do not deal with grammatical issues, that is, how to combine the 
terms to write complete sentences. Many CNL approaches, especially domain-specific 
ones, include controlled vocabularies. 

Constructed languages (or artificial languages or planned languages) are lan¬ 
guages that did not emerge naturally but have been consciously defined. In this broad 
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sense, the term includes (but is not limited to) languages such as Esperanto, program¬ 
ming languages, and CNLs. 

2.3 Types and Properties 


Let us now turn to the nature of CNLs. To bring order to their seemingly chaotic variety, 
more than 40 properties of such languages and their environments have been identified 
< |Wyner et al. 2010 1. Many of these properties, however, are fuzzy and do not allow for a 
strict categorization. For the survey to be presented in Section|4l we collect nine general 
and clear-cut properties and give them letter codes. As it turns out, however, these 
properties mainly describe the application environment of languages and not so much 
the languages themselves. For that reason, a classification scheme is introduced in the 
next section to describe the fundamental nature of CNLs and other languages. 

In general, controlled natural languages can be roughly subdivided according to 
the problem they are supposed to solve (ISchwitter 20021 : to improve communication 
among humans, especially speakers with different native languages (we will use the 
letter code C for these languages); to improve manual, computer-aided, semi-automatic, 
or automatic translation (t); and to provide a natural and intuitive representation for 
formal notations (f). The last type includes approaches for automatic execution of texts, 
which requires, at least conceptually, a mapping to an executable formalism. As we 
will see, these three t 5 rpes emerged at different points in time: Type C is the oldest, 
type T emerged later, and type F is the most recent of the three. Although this seems 
to be a sensible and useful subdivision, a simpler version based on just two t 5 rpes 
dominates the literature. Huijsen (11998b introduced the distinction between "human- 
oriented" and "computer-oriented" languages. The former roughly corresponds to type 
C, the latter to the t 5 rpes T and F. However, Huijsen observes that "it is often difficult to 
qualify a controlled language as either human-oriented or machine-oriented, since often 
simplification works both ways." Because these t 5 q)es describe problems rather than 
languages, reusing a language in a different problem domain can change its t 5 q)e even 
if the language itself has not changed at all. Other similar categorizations include the 
distinction between "naturalistic" (type C and T) and "formalistic" (t 5 rpe F) languages 
bPool 2006HClark et al. 201^ and the distinction between readability and translatability 
bReuther 2003b . 

Another apparent fact is that some languages originated from academia (letter 
code a), some from industry (l), some from a government or a UN agency (G), and 
others from a combination of the three. In addition, the distinction between general 
purpose languages and those for a particular restricted domain is often discussed 
bPool 2006b . This is related to the distinction of whether the lexicon is open or closed 
bAdriaens and Schreors 1992b . We will use the letter code D to denote languages target¬ 
ing a specific and narrow domain. A further important difference is fhe one befween 
wriffen and spoken languages. We will use w to denote languages that are intended 
to be written, and S for fhose fhat are intended to be spoken. However, none of fhese 
disfincfions seems fo describe a fundamenfal language properly: Languages fhat origi¬ 
nated in one environment can later be used in another; the lexicon can later be declared 
open or closed; written languages can be read aloud; and spoken languages can be 
written down. 

The rules that define a CNL can be proscripfive or prescripfive 
(INyberg, Mifamura, and Huijsen 2003 1 , or a combinafion of fhe two. Proscripfive 
rules describe whaf is not allowed, whereas prescripfive rules describe whaf is allowed. 
Languages defined by proscripfive rules alone musf have some sfarfing poinf in fhe 
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Table 1 

Letter codes for properties of CNLs 


Code 


Property 


c 

The 

T 

The 

F 

The 

W 

The 

s 

The 

D 

The 

A 

The 

I 

The 

G 

The 


form of a given (nafural) language. Languages wifh only prescripfive rules, in confrasf, 
fypically sfarf from scrafch. As we will see, fhere is a close connecfion of fhis disfincfion 
fo fhe concepf of simplicify as infroduced in fhe nexf section. 

Because of fheir lack of generalify, we do nof include here more specific 
low-level properties such as fhe supporf for subclauses and free compound¬ 
ing dAdriaens and Schreors 19921 , specific resfricfions on grammafical fenses and 
modal verbs dO'Brien 20031 1, and supporf for inferrogafive and imperafive senfences 
( [Wyner ef al. 2010) . 

Table [IJ summarizes fhe letter codes. Any fwo of fhese properties can overlap, and 
fherefore any combination is possible in fheory (wifh fhe exception fhaf no language 
should be neifher w nor S). 

Finally, fhere is one addifional aspecf of consfrucfed languages fhaf deserves af- 
fenfion: fheir life cycle. Some languages are nof much more fhan absfracf ideas, ofhers 
have left fhis sfage being applied fo concrefe problems, and yef ofhers have progressed 
fo widespread application in productive environmenfs. Af differenf sfages of mafurify, 
languages can be discontinued or abandoned, which signifies fhe end of fheir life cycle. 
Obviously, fhese differenf sfages flow info each ofher and if is often difticulf fo name a 
concrefe year of birfh or deafh (especially fhe laffer, as mosf CNLs die silenfly). Where 
possible, we will keep hack of fhese life cycle properties. 

3. PENS Classification Scheme 


As we have seen, the CNL properties introduced above describe application do¬ 
mains rather than the languages themselves. Certainly, several fundamental language 
properties have been identified and discussed in fhe liferafure, such as expressive¬ 


ness ( [Mifamura and Nyberg 1995 Boyd, Zowghi, and Farroukh 2005 IPool 200^ , com- 
plexif ^Mifamura and Nyberg 1995\ , grammar modifications dPool 2006ll , undersfand- 
ability, nafural look-and-feel, ambiguify, predicfabilify and formalify of definifion 
( [Wyner ef al. 2010) . However, fhese properties are all very fuzzy and do nof allow for 
sfricf cafegorizafion. 

To consfrucf a principled classification scheme for such fundamenfal language 
properties, if makes sense fo condense fhem fo a few dimensions fhaf are fo a large 
degree (fhough nof enfirely) independenf of each ofher. Ambiguify, predicfabilify, and 
formalify of definifion can be subsumed by a dimension fhaf we can call precision. 
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Expressiveness can make up a second dimension. Grammar modifications, understand- 
ability, and natural look-and-feel can be combined to a dimension of naturalness. A 
fourth dimension can be called complexity or — to have a dimension of the type "more 
is better" — simplicity. This is how we arrive at the four dimensions Precision, Expres¬ 
siveness, Naturalness, and Simplicity that underlie the PENS classification scheme0 

It seems that all fundamental language properties mentioned in the existing liter¬ 
ature fall into one of these general dimensions, or can be broken down into different 
aspects that can be mapped to these dimensions. There are no strong dependencies 
between any two dimensions (for any dimension pair, it is easy to imagine languages 
that are at the top, bottom, and opposite ends in these two dimensions). Furthermore, 
there is no obvious dimension pair that could be merged in a meaningful way. Together, 
this seems to indicate that this set of dimensions is minimal yet complete. 

The development of this scheme originated from the insight that CNLs can be 
conceptually located somewhere in the gray area between natural languages on the 
one end and formal languages on the other. Generally, CNLs are more formal than 
natural languages but more natural than formal ones. For instance, a natural language 
such as English is very expressive, but complex and imprecise. A formal language 
such as propositional logic, in contrast, is very simple and precise, but at the same 
time unnatural and inexpressive. CNLs must be somewhere in the middle, but where 
exactly? 

It seems obvious that all four of the above-mentioned dimensions are continuous in 
nature or at least very fine-grained. In fact, one can argue that each of the dimensions 
is actually multidimensional and that representing it in one dimension is a rough 
simplification. Such simplifications are necessary, however, in order to get a precise 
measure for such vague concepts such as expressiveness. 

Intuitively, PENS uses a natural language such as English and a formal language 
such as propositional logic as pegs to span a conceptual space in which different 
kinds of controlled natural languages can be placed. In order to get a general but 
strict classification scheme, PENS drastically simplifies things by restricting each of 
its four dimensions to five classes, to be numbered from 1 to 5. These five classes are 
non-overlapping and consecutively cover the one-dimensional space between the two 
extremes: English on the one end and propositional logic on the other. For precision and 
simplicity, English is on the bottom end of the scale in class 1, which we write as P^ and 
S^. Propositional logic is on the opposite end of the scale in class 5, represented with 
P^ and S®. For expressiveness and naturalness, the roles are switched: English is at the 
top end (E^ and N^) and propositional logic at the bottom (E^ and N^). In this way, the 
scheme defines a conceptual space for CNLs that includes natural and formal languages 
as special cases. Combining the four dimensions gives 5^ = 625 classes, represented 
with shorthand such as P^E^N^S^ for English and P^E^N^S® for propositional logic. The 
difficult and interesting part of this intellectual exercise is where and how to draw the 
borders between the five classes of each dimension. 

The decision to use five classes for each dimension, and not four or six, is somewhat 
arbitrary. A larger number of classes allows for more detailed classifications, whereas 
it also gets more difficult to come up with strict and objective criteria to define these 
classes. Five seems to be a good middle ground. 


1 These four d imension have first been sketched as "design principles" in the author's doctoral thesis 
iKuhn 20 To 1 . where "precision" was called "clearness." 
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3.1 Precision 

The precision dimension of the PENS scheme captures the degree to which the meaning 
of a text in a certain language can be directly retrieved from its textual form, that is, 
the sequence of language symbols. Natural language is very imprecise in this sense, 
because a large amount of context information is needed to grasp the meaning of t 5 rpical 
sentences. Formal logic languages, on the other hand, have maximal precision, because 
their meaning is strictly defined solely on the basis of the possible sequences of their 
language symbols. The symbol grounding problem, that is, the problem of mapping 
symbols to their counterparts in the real world, is not considered here, because it affects 
all languages, including both natural and formal ones. On this precision dimension, 
languages are divided into the five classes P^, P^, P^, P^, and P® as follows: 

Imprecise languages (P^). Virtually every sentence of these languages is vague to a certain 
degree. Without taking context into account, most sentences of a certain complexity are 
ambiguous. The automatic interpretation of such languages is "ATcomplete," which 
means it is a problem for which no complete solutions are in sight. These languages 
require a human reader to check whether a given statement is S5mtactically correct, 
and include borderline statements on which readers disagree. The same applies to the 
semantic properties of the language. All natural languages belong to this category. 

Less imprecise languages (P^). For these languages, the degree of ambiguity and vague¬ 
ness is considerably lower than in natural languages, and their interpretation depends 
much less on context. They restrict the use and/or the meaning of a wide range of 
the respective ambiguous, vague, or context-dependent constructs. However, these con¬ 
structs are still too dominant to make automatic interpretation reliable. Such languages 
are t 5 q)ically not related to a formal (i.e., mathematically precise) underpinning. 

Reliably interpretable languages (P^). The S 5 mtax of these languages is heavily restricted, 
though not necessarily formally defined. The restrictions are strong enough to make 
automatic interpretation reliable. There is a logical underpinning or at least a formal 
conceptual scheme, in which the semantics of sentences can be represented. However, 
the mapping of sentences to their formal representations is itself not defined in a fully 
formal way, but requires external background knowledge, heuristics, or user feedback. 

Deterministically interpretable languages (P^). Such languages are fully formal on the 
syntactic level; that is, they are (or can be) defined by a formal grammar. Each text 
in such a language can be deterministically parsed to a formal logic representation, or 
a small set of all possible representations (including all and only the possible ones). 
Based on the underlying formalism, these representations describe the meaning of 
the sentences, but they may be underspecified in the sense that they require certain 
parameters, background axioms, external resources, or heuristics to enable sensible 
deductions. 

Languages with fixed semantics (P^). These languages are fully formal and fully specified 
on both the S 5 mtactic and semantic levels. Each text has exactly one meaning, which can 
be automatically derived. The circumstances in which inferences hold or do not hold are 
fully defined. What conclusions follow from a given text in the language (e.g., whether 
it is consistent and which sentences of the language are a consequence of the text) can 
be defined with mathematical rigor, without the help of heuristics or external resources. 
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3.2 Expressiveness 

The dimension of expressiveness describes the range of propositions that a certain 
language is able to express. A language X is more expressive than a language Y if 
language X can describe everything that language Y can, but not vice versa. The relation 
of "being more expressive" does not constitute a total order: For two given languages 
of nonequal expressiveness, it can be the case (and often is the case) that neither is more 
expressive than the other. This entails that ranking a general set of languages in a linear 
order according to their expressiveness cannot be done in a completely objective way. 
A classification scheme, such as the one presented here, must therefore rely on only 
a subsef of all possible expressiveness feafures. These expressiveness feafures should 
be general and imporfanf ones, and af fhe same fime allow for a balanced and clear 
discrimination between the languages to be classified. The PENS classification scheme 
employs the following five expressiveness feafures: 

(a) universal quanfificafion over individuals (possibly limifed) 

(b) relafions of arify greafer fhan 1 (e.g., binary relations) 

(c) general rule sfrucfures {if-then sfafemenfs wifh multiple universal 
quanfificafion fhaf can fargef all argumenf positions of relafions) 

(d) negation (strong negation or negation as failure) 

(e) general second-order universal quanfificafion over concepfs and relafions 

For each of fhese feafures fo be considered fulfilled, fhey should be an infegral parf 
of fhe language and nof jusf manifesfed by a few special cases. There are a number of 
ofher imporfanf feafures fhaf could be considered, for example supporf for exisfenfial 
quantification, equality, and t 5 rpes of supporfed speech acfs (such as declarafive, infer- 
rogafive, direcfive, and indirecf speech acfs). However, fo achieve a simple classification 
info a sequence of five classes, fhe feafures lisfed above will furn ouf fo be sufficienf 
and lead fo a classificafion fhaf seems consisfenf wifh fhe infuifive undersfanding of 
expressiveness. 

Since fhis classificafion sysfem should nof only include declarafive formal lan¬ 
guages buf also informal as well as procedural ones, if makes sense fo apply a weaker 
notion of expressiveness fhan whaf is usually applied fo logic languages. We can adopf 
from fhe research on programming languages fhe convenfion fhaf a cerfain language 
consfrucf adds expressiveness if ifs removal would require "a global reorganization of 
fhe entire program" dFelleisen 19911 . If a cerfain language consfrucf allows us fo express 
somefhing locally which would ofherwise require us fo reorganize fhe entire fexf, fhen 
we say fhaf fhis language consfrucf makes fhe language more expressive. This means, 
for example, fhaf a language wifh second-order feafures relying on Henkin semantics 
qualifies for the last criterion of fhe above lisf, even fhough Henkin semantics can be 
reduced to first-order. A given set of sfafemenfs written in a language wifh Henkin- 
sfyle second-order feafures carmof generally be reduced fo firsf-order logic wifhouf 
global reorganization, fhaf is, changing sfafemenfs fhaf do nof acfually use second-order 
feafures. Wifh fhis qualification, we can define fhe five classes as follows: 

Inexpressive languages (E^). These are languages lacking one or bofh of fhe feafures (a) 
and (b): They have no universal quanfificafion or no relafions of arify greafer fhan 1. 
Proposifional logic belongs fo fhis cafegory. 
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Languages with low expressiveness (E^). Such languages have both of the features (a) and 
(b), but are not E^-languages: They have universal quantification over individuals and 
relations of arity greater than 1. Description logics belong to this category. 

Languages with medium expressiveness (E^). These languages have all of the features 
(a), (b), (c), and (d), but are not E^-languages: They have general rule structures and 
negation, in addition to the features of E^. Eirst-order logic belongs to this category. 

Languages with high expressiveness Such languages have all listed features (a), (b), (c), 
(d), and (e), but are not E^-languages: They have second-order universal quantification 
over concepts and relations, in addition to the features of E^. Second order predicate 
calculus belongs to this category. 

Languages with maximal expressiveness (E^). These languages can express anything that 
can be communicated between two human beings. Such languages cover any statement 
in any t 5 q)e of logic. Obviously, this includes all features listed above. All natural 
languages belong to this category. 

3.3 Naturalness 

The dimension of naturalness describes how close the language is to a natural language 
in terms of readability and understandability to speakers of the given natural language. 
We define the five classes as follows: 

Unnatural languages (N^). These are languages that do not look natural, making heavy 
use of symbol characters, brackets, or unnatural ke 5 rwords. It might be possible to use 
natural words or phrases as names for certain entities, but this is neither required nor 
further defined by the language. 

Languages with dominant unnatural elements (N^). Natural language words or phrases 
are an integral part of such languages, but are dominated by unnatural elements or 
unnatural statement structure, or have unnatural semantics. The natural elements do 
not connect in a natural way to each other, and speakers of the given natural language 
typically fail to intuitively understand the respective statements. 

Languages with dominant natural elements (N^). In such languages, natural elements are 
dominant over unnatural ones and the general structure corresponds to natural lan¬ 
guage grammar. Due to the remaining urmatural elements or unnatural combination 
of elements, however, the sentences cannot be considered valid natural sentences. 
Speakers of the given natural language do not recognize the statements as well-formed 
sentences of their language, but are nevertheless able to intuitively understand them to 
a substantial degree. 

Languages with natural sentences (N^). These are languages with sentences that can be 
considered valid natural sentences. Speakers of the respective natural language recog¬ 
nize the statements as sentences of their language and are able to correctly understand 
their essence without instructions or training. Minor or infrequent exceptions and 
unnatural means for clarification (including text color, indentation, h 5 rphenation, and 
capitalization) are permitted as long as they do not disturb the natural look-and-feel 
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and the natural flow of the sentence. Parentheses and brackets in unnatural positions, 
however, in most cases do disturb the natural text flow considerably, and are therefore 
typically not present in this category. While single sentences have a natural flow, this 
does not scale up to complete texts or documents. Complete texts in such languages 
seem very clumsy and repetitive, and lack a natural text flow. 

Languages with natural texts (N^). With these languages, complete texts and documents 
can be written in a natural style, with a natural text flow, and with natural semantics. 
In the case of spoken languages, complete dialogs can be produced with a natural flow 
and a natural combination of speech acts. 

We can now be a little more precise concerning our definition of CNL. Property number 
3 of the long version of the definition shown in Section IZTl says that a CNL "preserves 
most of the natural properties of its base language, so that speakers of the base language 
can intuitively and correctly understand texts in the controlled natural language, at 
least to a substantial degree." We will interpret this in such a way that it only includes 
languages of naturalness and higher. Thus, by this definition, there are no CNLs with 

or 

3.4 Simplicity 

The fourth dimension is a measure of the simplicity or complexity of an exact and 
comprehensive language description covering S 5 mtax and semantics, if such a com¬ 
plete description is possible at all. This description should not presuppose intuitive 
knowledge about any natural language. It is therefore not primarily a measure for the 
effort needed by a human to learn the language, neither does it capture the theoretical 
complexity of the language (as, for example, the Chomsky hierarchy does). Rather, it 
is closely related to the effort needed to fully implement S 5 mtax and semantics of the 
language in a mathematical model, such as a computer program. 

The PENS scheme applies a very pragmatic and simple indicator for simplicity: the 
number of pages in natural language needed to the describe the language in an exact 
and comprehensive way. For languages for which no such exact and comprehensive 
descriptions exist or can be written (that do not presuppose linguistic knowledge on 
the side of the reader, and given the current state of science), we can distinguish 
languages with the complexity of natural language from languages with considerably 
lower complexity. 

These "exact and comprehensive descriptions" should define all S 5 mtactic and 
semantic properties of the language using accepted grammar notations to define the 
S 5 mtax and accepted mathematical or logical notations to define the semantics. They are 
assumed to use scientific writing style as found in scientific articles or technical reports, 
and should allow a skilled grammar engineer to implement a correct and complete 
parser within reasonable time. The page count should be based on a one-column format 
with up to about 700 words per page. It is important to note that the criterion is not the 
presence of such a description but whether it is possible or not to write one. 

In order to treat languages with fixed vocabularies and those with extensible ones 
in an equal way, the above mentioned language descriptions do not need to include the 
vocabularies. Concretely, the five classes are defined as follows: 

Very complex languages (S^). These languages have the complexity of natural languages. 
They cannot be described in an exact and comprehensive marmer. 


11 


Computational Linguistics 


Languages without exhaustive descriptions (S^). These are languages that are considerably 
simpler than natural languages, in the sense that a significant part of fhe complex 
sfrucfures are eliminafed or heavily resfricfed. Sfill, fhey are foo complex fo be described 
in an exacf and comprehensive manner. Usually, fhe definitions of such languages jusf 
describe resfricfions on fop of a given nafural language fhaf is faken for granfed. 

Languages with lengthy descriptions (S^). Such languages can be defined in an exacf and 
comprehensive manner, buf if requires more fhan fen pages fo do so. 

Languages with short descriptions (S^). These are languages for which an exacf and com¬ 
prehensive descripfion requires more fhan one page buf nof more fhan fen pages. 

Languages with very short descriptions (S^). Such very simple languages can be described 
in an exacf and comprehensive manner on a single page. 

S^ and S^ are considered complex because fhey rely on a given nafural language. 
Coming back fo a distinction briefly infroduced in fhe previous section, such languages 
are f 5 rpically defined by proscriptive rules, describing whaf is nof allowed compared fo 
fhe full language. S^, S‘^, and S®, in confrasf, fypically use prescriptive rules fhaf define fhe 
language from scrafch. For fhaf reason, fhey are simpler in our sense of fhe word fhan 
languages of fhe firsf type, which "import" the complexity of full nafural language. 

Before we move on fo apply fhis scheme, if should be sfressed fhaf PENS is designed fo 
measure fhe nature of a language, nof ifs quality or usefulness. If should be used fo describe 
languages, nof fo rank fhem. As fhe "perfecf" language does nof exisf, compromises 
have fo be made. Depending on application area, enviromnenf, and goal, differenf 
weighfs are assigned fo fhe PENS dimensions, and fherefore differenf optimal levels 
resulf. In fheory, more is better for each of fhe PENS dimensions, buf fhis does nof neces¬ 
sarily hold in pracfice. A cerfain level in any of fhe dimensions is often good enough for 
a given application domain, and going beyond fhaf level brings no addifional benetif. 
Eurfhermore, as we resfricf ourselves fo jusf five classes per dimension, fhere can be 
relafively large differences within one class. If is inevifable fhaf fwo languages in fhe 
same class can be farfher aparf in fhe respective dimension fhan fwo languages in 
adjacenf classes. Even if a language has higher PENS values in every dimension fhan 
anofher language, fhis does nof mean fhaf fhe former is "beffer" in any meaningful 
sense of fhe word. Having a high PENS score for expressiveness, for example, jusf 
means fhaf fhe general expressiveness level is high, and nof fhaf fhe language is able fo 
express each and every sfafemenf of all languages wifh a lower score. Similarly, having 
a high score for nafuralness does nof mean fhaf all aspecfs of fhe language are more 
nafural as compared fo all languages wifh a lower score. 

4. Languages 

We can now furn fo fhe acfual survey. Eor practical reasons, we resfricf ourselves here 
fo English-based languages, leaving ouf CNLs fhaf are based on ofher languages, such 
as Chinese, Erench, German, Greek, Spanish, and Japanese dPool 20061 . To give an 
overview of fhe differenf exisfing English-based GNLs, fwelve imporfanf and influen- 
fial languages are infroduced below. The complefe lisf can be found in fhe appendix; 
surprisingly, we ended up wifh exacfly 100 languages. In addition, a handful of ofher 


12 




Tobias Kuhn 


A Survey and Classification of Controlled Natural Languages 


languages for comparison are introduced below, such as natural English and proposi¬ 
tional logic. Each language is classified according to the nine properties with letter codes 
and the PENS scheme. A best guess is made in the cases where not enough information 
is available. The descriptions in the appendix are shorter in the case of similar languages 
or scarce information. This data set is also available online as a CSV tableH 

There are many user interface approaches based on some sort of natural language 
input, and it could be argued that they all — at least indirectly — define and use 
a controlled language, because none of them is able to correctly process full natural 
language. Such approaches, however, are included here only if the restrictions on the 
language are considered an inherent property of the approach and not a shortcoming 
of its implementation. In other words, the following listing excludes languages whose 
restrictions are not design decisions of the general approach but practical concessions, 
for example [Warren and Pereira (1982) . The same criterion is applied to verbalization 
approaches, which inevitably define a restricted version of fhe respecfive language fhaf 
could be considered a CNL, for example Halpin (2004| , Jarrar, Keef, and Dongilli (2006) , 
and Lukichev and Wagner (2006). Ofher languages follow an approach called concep- 
fual authoring or WYSIWYM ( [Halleff, Scoff, and Power 2007) where fexfs are creafed 
by shorf cycles of language generafion and user-triggered modification actions. We 
include such languages here, because in this case the restrictions on the language are 
an important aspect of fhe approach. Pinally if should be menfioned fhaf we leave ouf 
ficfional languages, such as Newspeak of George Orwell's Nineteen Eighty-Four. 

Languages fhaf do nof have an official name are infroduced by a "generic name 
in quofafion marks." Unless sfafed ofherwise, quofes and examples are faken from fhe 
publicafions cifed in fhe begirming of each paragraph. 


4.1 English-based Controlled Languages 


Below, twelve selected CNLs are introduced, roughly in chronological order of fheir 
firsf appearance or fhe firsf appearance of similar predecessor languages. Por fhis small 
sample, languages are chosen fhaf were influenfial, are well-documenfed, and/or are 
sufficiently different from the other languages of fhe sample. 

"Sowa's syllogisms" (ISowa 2000bt are simple logic languages based on fhe syllogisms 
originally infroduced by [Aristotle (ca. 350 BC) . Sowa was probably the first to bring 
them into the context of CNL, claiming fhaf fhey are fhe firsf reporfed insfance of a 
confrolled nafural language. Because fhis survey is resfricfed to English, Sowa's version 
of fhe syllogisms is lisfed here insfead of Arisfofle's original version in ancienf Greek. 
The complefe language can be described by jusf four simple senfence patterns: 

Every A is a B. Some A is a B. No A is a B. Some A is not a B. 

A and B can be any English common nouns such as cat and animal. This language is very 
similar to the language fo presented and studied by [Pratt-Hartmarm (2004) , who used 
some additional patterns: 

Every A is not a B. No A is not a B. P is a B. P is not a B. 

Here, P can be any English proper name such as Socrates. We will use the term "Sowa's 
syllogisms" in a sense that includes such similar approaches. The semantics of syllo¬ 
gisms is also very easy fo define. The firsf four patterns shown above can be mapped fo 
firsf-order logic like fhis (and similarly for fhe ofher pafferns): 


2 http://purl.org/tkuhn/cnlsurvey/data 
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Hereby, we have an exact and comprehensive description of the language, taking just 
a couple of lines. Despife fhe simple sfrucfure of fhe language, fhe senfences are per- 
fecfly nafural. Ifs expressiveness, however, is very resfricfed: Only very simple senfence 
sfrucfures are covered and only one-place relafions are supporfed. — P^E^N^S®, F w A 

Basic English ( [Ogden 1930} is a language presenfed in 1930 fhat should improve 
communication among people around the globe. It is the first reported instance of a 
confrolled version of English, af leasf fhe firsf one fhaf received broader recognifion. 
If influenced Caferpillar Eundamenfal English, which became ifself a very influenfial 
language. Basic English was designed as a common basis for communication in politics, 
economy, and science. If resfricfs fhe grammar and makes use of only 850 English roof 
words. The resfricfions are arguably mosf drastic in fhe case of verbs. Only 18 verbs 
are supported: put, take, give, get, come, go, make, keep, let, do, be, seem, have, may, 
will, say, see, and send. These verbs can be combined with prepositions to form more 
specific relations such as put in to express insert. Other verbs can be expressed with the 
help of nouns, such as give a move insfead of using move as a verb. The usage of fhe 
given words and fheir varianfs is described by informal grammar rules, for example 
"Collective nouns may be formed from adjectives when used wifh the." These are fwo 
examples of senfences in Basic English: 

The camera man who made an attempt to take a moving picture of the society women, 
before they got their hats off, did not get off the ship till he was questioned by the 
police. 

It was his view that in another hundred years Britain will be a second-rate power. 

Many variations exist that use larger word sets. The Simple English version of 
WikipediaJl for example, claims fo use Basic English, buf in facf uses a much less 
resfricfed language. Basic English is still used foday and promofed by a dedicafed Basic- 
English Insfifufejd Many fexfs have been written in fhis language, including fexfbooks, 
novels, and large parfs of fhe bible. The drastic simplifications on the lexical level 
together with the grammatical restrictions constitute a significant gain in precision 
compared to full English. Still, any fype of topic can be expressed with a natural text 
flow. The informal resfricfions on fhe grammar, however, are nof sfrong enough fo 
significanfly reduce fhe complexify of fhe language (in fhe PENS sense of complexify). 
— P^E^N^S^C w 

E-Prime or E' dBourland 1965ll is a resfricfed version of English wifh fhe only resfricfion 
being fhat the verb to be is forbidden fo use. This includes all inflectional forms such as 
are, was and being, regardless of whefher used as auxiliary or main verb. The language 
was presenfed in 1965 buf fhe idea goes back to the late 1940s. The motivation for fhe use 
of E-Prime is fhe belief fhaf "dangers and inadequacies [...] can resulf from fhe careless, 
unthinking, automatic use of fhe verb To be'." E-Prime is claimed by ifs proponenfs fo 
enhance clarify. The sfatemenf "We do fhis because if is righf" would nof be allowed, 
buf one would have fo rephrase if in a way fhaf does nof include to be, for example: 

We do this thing because we sincerely desire to minimize the discrepancies between 
our actions and our stated "ideals." 

In the area of nafural language processing, however, fhe verb to be is not considered one 
of fhe mosf difficulf problems, which is good evidence fhaf E-Prime is nof considerably 


3 http://simple.wik.ipedia.org 

4 http://www.basic-english.org 


14 







Tobias Kuhn 


A Survey and Classification of Controlled Natural Languages 


more precise than full English in the PENS sense. Also in terms of complexity it is not 
considerably different from full English, because words such as become and exist are 
allowed that can replace the forbidden to be in most cases. On the other hand, it seems 
true that it is always possible to rephrase a text without the use of to be in a way that is 
fully natural though possibly longer than the original. — P^E^N^S^, C w A 


Caterpillar Fundamental English (CEE) dVerbeke 19731 was an influential controlled 
language developed at Caterpillar. It was officially introduced in 1971, was based on 
Basic English (ISmart 20031 , and has been reported to be the earliest industry-based 
CNL ( Wojcik and Hoard 1997| . The need for a controlled language emerged because 
of the increasing sophistication of Caterpillar's products and the need to communicate 
with non-English speaking service personnel in different countries (IVerbeke 19731 : "To 
summarize the problem: There are more than 20,000 publications that must be under¬ 
stood by thousands of people speaking more than 50 different languages." The idea of 
CEE was "to eliminate the need to translate service manuals" ( jKamprath et al. 1998| . 
A trained, non-English speaking mechanic familiar with Caterpillar's products should 
be able to understand the language after completing a course on CEE consisting of 
30 lessons. The vocabulary of the language is restricted to around 800 to 1,000 words 
dCrabbe 2009E with only one meaning defined for each of them (e.g., right only as 
the opposite of left). Still, many of the words "had broad semantic scope and it 
was assumed that they would be disambiguated in context by the human reader" 
( Kamprath et al. 1998} . The following ten rules summarize the grammatical restrictions 
( Crabbe 2009t : ' 


1. Make positive statements. 

2. Avoid long and complicated sentences. 

3. Avoid too many subjects in one sentence. 

4. Avoid too many successive adjectives and 
nouns. 

5. Use uniform sentence structures. 

These are two examples of CEE sentences: 

The maximum endplay is .005 inch. 

Lift heavy objects with a lifting beam only. 

CEE was discontinued by Caterpillar in 1982, because (among other reasons) "the 
basic guidelines of CEE were not enforceable in the English documents produced" 
( [Kamprath et al. 1998| . As a result. Caterpillar Technical English (see appendix) was 
developed following a different approach: The restrictions on the language should be 
enforceable, and should reduce translation costs instead of trying to eliminate the need 
for translations altogether. The strong lexical restrictions together with some grammat¬ 
ical constraints make CEE more precise than full English, but it is not considerably 
different in terms of expressiveness, naturalness, and complexity. — P^E^N^S^, C w D I 

FAA Air Traffic Control Phraseology dEAA 20101 is a controlled language defined by 
the Eederal Aviation Administration (EAA) and used for the communication in air traffic 
coordination, going back to at least the early 1980s. There are other very similar lan¬ 
guages for air traffic control such as the ICAO and CAA phraseologies. To a large extent, 
these languages are indistinguishable from each other, and together they are sometimes 
called AirSpeak (IRobertson 1987E The EAA Phraseology is defined by more than 300 
fixed sentence patterns such as "(ACID), IN THE EVENT OE MISSED APPROACH 
(issue traffic). TAXIING AIRCRAPT/VEHICLE LEET/RIGHT OE RUNWAY." This is 
an example of a statement following that pattern: 


6. Avoid complicated past and future tenses. 

7. Avoid conditional tenses. 

8. Avoid abbreviations, contractions, and 
colloquialisms. 

9. Use punctuation correctly. 

10. Use consistent nomenclature. 
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United 623, in the event of missed approach, taxiing aircraft right of runway. 

In addition to these explicit patterns, there are many more implicit patterns defined 
in prose form, for example "Issue advisory information on [...] bird activity. Include 
position, species or size of birds, if known, course of flight, and altitude." The following 
sfafemenf is an example fhaf corresponds fo fhis implicif pattern: 

Flock of geese, one o'clock, seven miles, northbound, last reported at four thousand. 

Vocabulary and semantics are restricted too, for example "Use the word gain and /or loss 
when describing to pilots the effects of wind shear on airspeed." Phraseology statements 
can be mixed with statements in full English in cases where no pattern exists to express 
the desired message. The language is heavily restricted and much less ambiguous than 
full English. It is inexpressive in the sense that no universal quantification is supported, 
and is not sufficiently restricted to make an exact and exhaustive description feasible. 
— P^ElN^S^, C S D G 

ASD Simplified Technical English (ASD-STE) HASP 201311 , often abbreviated to Sim¬ 
plified Technical English (STE) or just Simplified English, is a CNL for the aerospace 
industry. Originally inspired by a language called ILSAM dAdriaens and Schreors 1992t , 
the language had its origins in 1979, but it was only in 1986 when it was officially pre¬ 
sented for the first time, then under the name AECMA Simplified English. It received 
its current name in 2004 when AECMA merged with two other associations to form 
ASD. The main purpose of the language is to make texts easier to understand, especially 
for non-native speakers. While AECMA Simplified English was designed to make 
translation into other languages unnecessary, one of the original goals of ASD-STE was 
to improve translation. Today, the language is maintained by the Simplified Technical 
English Maintenance Group. ASD-STE is based on English with restrictions expressed 
in about 60 general rules. These rules restrict the language on the lexical level (e.g., "Use 
approved words from the Dictionary only as the part of speech given"), on the S 5 mtactic 
level (e.g., "Do not make noun clusters of more than three nouns"), as well as on the 
semantic level (e.g., "Keep to the approved meaning of a word in the Dictionary. Do not 
use the word with any other meaning."). There is a fixed vocabulary consisting of terms 
common to the aerospace domain. Additionally, user-defined "Technical Names" and 
"Technical Verbs" can be introduced. This is an exemplary excerpt of a text in ASD-STE: 

These safety precautions are the minimum necessary for work in a fuel tank. But the 

local regulations can make other safety precautions necessary. 

Even though its restrictions make ASD-STE considerably more precise than full English, 
it does not allow for reliable automatic interpretation. Eull expressiveness and full 
naturalness of unconstrained English are retained, but also its complexity. — P^E^N^S^, 
C T W D I 


Standard Language (SLANG) (Rychtyck 5 rj 2002 120051 is a language developed at Eord 
Motor Gompany starting from 1990. It is designed for process sheets containing build 
instructions for component and vehicle assembly plants. It is still used at Eord and 
has been continually extended and updated to reflect technical and business-related 
advances. With SLANG, engineers can write instructions that are clear and concise 
and at the same time machine-readable. Based on these instructions, the system can, 
among other things, automatically generate a list of required elements and calculate 
labor times. In addition, the restricted nature of the language is exploited to translate 
such instructions with the help of machine translation for their use in assembly plants 
in different countries. All SLANG sentences are in imperative mood and follow a certain 
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general pattern starting with a main verb and followed by a noun phrase. There are ad¬ 
ditional restrictions on vocabulary and semantics. These are two exemplary sentences: 

OBTAIN ENGINE BLOCK HEATER ASSEMBLY EROM STOCK 

APPLY GREASE TO RUBBER O-RING AND CORE OPENING 

A parser is used to check for compliance with the restrictions. English grammar is 
followed with some minor deviations: For example, articles can be dropped and some 
kinds of modifiers can be used in unnatural ways. — P^E^N^S^, C F W D I 

SBVR Structured English dOMG 200811 is a CNL for business rules first presented 
around 2005. It is part of fhe Semanfics of Business Vocabulary and Business Rules 
(SBVR) sfandard. If was probably influenced by a language called RuleSpeak fhaf is 
very similar and was firsf presenfed in 1994. The vocabulary is exfensible and consisfs 
of four f 5 ^es of sentence constituents: terms (i.e., concepts), names (i.e., individuals), 
verbs (i.e., relations), and ke 5 rwords (e.g., fixed phrases, quanfifiers, and deferminers). 
Each of fhese has ifs own color and sfyle, as fhe following examples show: 

A rental must be guaranteed by a credit card before a car is assigned to the rental . 

Rentals bv Booking Mode contains the categories ' advance rental ' and ' walk-ln rental .' 

The SBVR standard provides formal semanfics based on second-order logic wifh 
Elenkin semanfics. The second of fhe above examples makes use of fhe second-order 
feafures. Some keywords have a precise meaning, such as or meaning inclusive logical 
disjuncfion (unless followed by but not both). Ofher ke 5 rwords, however, are less pre¬ 
cise, such as fhe deferminer a being defined as "universal or exisfenfial quantification, 
depending on confexf based on English rules." The language sfricfly defines fhe permis¬ 
sible senfence consfifuenfs, buf is much less sfricf in defining fhe order in which fhese 
consfifuenfs can be puf. The S 5 mfax sfrucfure can be ambiguous (e.g., when using and 
and or in fhe same senfence), and so can be quanfifier scopes and anaphoric references. 
There is no formal grammar of fhe language, and ifs definition depends to some degree 
on the linguistic understanding of a human reader. — P^E^N^S^, C F W I 

Attempto Controlled English (ACE) (|Fuchs, Kaljurand, and Kuhn 2008) is a CNL with 
an automatic and unambiguous translation into first-order logic. ACE was first pre¬ 
sented in 1996 as a language for soffware specifications. Lafer, fhe focus shiffed towards 
knowledge represenfafion and fhe Semanfic Web. The language has been exfended over 
fhe years in various ways. The mosf nofable feafures of ACE include complex noun 
phrases, plurals, anaphoric references, subordinated clauses, modalify, and questions. 
These are two exemplary ACE senfences: 

A customer owns a card that is invalid or that is damaged. 

Every continent that is not Antarctica contains at least 2 countries. 

ACE sentences are deterministically mapped to discourse representation structures, a 
notational variant of firsf-order logic. These expressions, however, are underspecified 
in the sense that many deductions (e.g., when involving plurals or modal verbs) require 
external background axioms that are not fixed by fhe ACE definition (these axioms 
are external in the sense that they are not necessarily expressible in ACE). This makes 
it possible to use ACE in different areas such as ontology editors, rule systems, and 
general reasoners with semantics that are not fully compatible. ACE is, wifh a few minor 
exceptions, fully nafural on fhe senfence level, buf longer fexfs do nof have a nafural fexf 
flow. Recenfly, ACE has also been used in fhe confexf of rule-based machine franslafion 
l |Kaljurand and Kuhn 2013) , buf franslafion was nof a sfafed goal during fhe design of 
the language. — P'‘E^N'‘S'^, F w A 
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"Drafter Language" dPower and Scott 199811 is a CNL used in a system called Drafter- 
II. The language is used for instructions to word processors and diary managers. The 
system employs a conceptual authoring approach: Users cannot directly edit the CNL 
text, but they can only trigger modification actions starting from a small stub sentence. 
In this way, incomplete statements are gradually completed by the user. The following 
example is a sequence of two incomplete statements showing one such completion step: 

Schedule this event by applying this method. 

Schedule the appointment by applying this method. 

The first sentence has two missing parts: this event and this method. At this point, the user 
can choose, for example, the appointment to fill in the first missing part, which leads to 
the second statement, which is still incomplete but has only one missing part left. Once 
a statement is completed, Drafter-II internally maps it to Prolog expressions, which 
are then automatically executed. As structural ambiguity can be resolved based on the 
given sequence of modification actions, languages following the conceptual authoring 
approach do t 5 q)ically not attempt to fully eliminate structural ambiguity. A given text 
can have multiple parse trees, only one of which corresponds to the way it was created. 
— P^plN^S^, F W D A 

E2V dPratt-Hartmann 20031 1 is a controlled language that was introduced in 2001 and 
corresponds to the language £3 studied in later work dPratt-Hartmann 2004k The ulti¬ 
mate goal is "to provide useable tools for natural language system specification." E2V 
deterministically maps to the two-variable fragment of first-order logic. Because of this, 
satisfiability of E2V sentences and texts is decidable and computation is NEXPTIME 
complete. Two examples of E2V sentences are shown here: 

Some artist does not despise every beekeeper. 

Every artist who employs a carpenter despises every beekeeper who admires him. 

The language is defined by 15 simple grammar rules plus nine predefined lexical rules 
for general words such as every and does not. A separate, user-defined lexicon contains 
the domain-specific words such as artist and admires. Altogether, E2V is a precise, 
natural, simple, but relatively inexpressive controlled language. — P^E^N^S^, F w A 

Formalized-English (FE) dMartin 20021 is a CNL for knowledge representation. It is 
based on Conceptual Graphs and the Knowledge Interchange Format (KIF), and fo¬ 
cuses on expressiveness. It covers a wide range of features including general universal 
quantification, negation, contexts (statements about statements), lambda abstractions, 
possibility, collections, intervals, and higher-order statements (reducible to first-order 
logic). Two examples of statements in FE are shown here (the second one is higher- 
order): 

At least 93% of [bird with chrc a good health] can be agent of a flight. 

If 'a binaryRelationType *rt has for chrc the transitivity' then 'if "x has for ht 'y that has 

for ht 'z' then 'N has for *rt "z''. 

FE looks natural for simple statements, but becomes quite urmatural for more complex 
ones. This is due to urmatural use of parentheses, quotation marks, variables, and 
keywords such as chrc. The S 5 mtax of the language is defined by about 50 rules in a 
parser generator language. — P^E^N^S^, F w A 
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4.2 Languages for comparison 

For the analysis to be described in the next section, we will use the following languages 
for comparison, which are not CNLs according to our definition: 

English is our representative of a natural language. — P^E^N^S^, C w S 

Propositional logic is a very basic logic language. — P^E^N^S®, F w A 

First-order logic can be considered an extension of propositional logic. It is more ex¬ 
pressive, but also more complex. — P^E^N^S^, F w A 

COBOL is one of the oldest programming languages, which some call a controlled 
natural language dSowa 2000aL This is an exemplary COBOL statement: 

PERFORM P WITH TEST BEFORE VARYING C FROM 1 BY 2 UNTIL C GREATER 
THAN 10. 

Although COBOL uses natural phrases where other programming languages use sym¬ 
bols or short keywords, the statement structure does not really follow natural grammar. 
For that reason, we do not consider it a CNL. — P^E^N^S^, F w A I G 

Manchester OWL syntax ( [Horridge et al. 2006) is a user-friendly S 5 mtax for the ontol¬ 
ogy language OWL. This is an exemplary expression: 

Pizza and not (hasTopping some FishTopping) and not (hasTopping some 
MeatTopping) 

Instead of logical symbols, natural words such as not and some are used. The general 
structure, however, resembles formal and not natural languages, which is why we do 
not consider it a CNL. — P^E^N^S'^, F w A 

Naturally, there are many more languages that could be used for comparison, but the 
list above seems to be a good sample. 

5. Analysis 

The data presented in the previous section and in the appendix allow for different kinds 
of aggregations and analyses. In particular, the classes and properties of the observed 
languages and the timeline of their evolution are interesting. 

5.1 PENS Classes 

Table |2] summarizes the PENS classes and properties of the discussed CNLs. Some 
interesting patterns can be found in these data. Theoretically, there are 5^ = 625 possible 
PENS classes, but not all of them are observed "in the wild." Some are even practically 
impossible, as far as we can tell, such as the perfect class P^E^N^S^. The collected CNLs 
cover 25 distinct classes, which might seem a small number with respect to the entire 
PENS space, but they are, as we will see, widely scattered. Even though some hotspots 
of classes and properties can be identified, the languages exhibit a broad variety. 

Visualization of the languages in the conceptual space can give us a better picture 
of the data. Since the PENS scheme is four-dimensional, it is difficult to visualize all 
dimensions in a single diagram. Figure [T] shows a diagram for each of the six possible 
dimension pairs: The dots represent CNLs in comparison to natural languages such 
as English (white dot) and common formal languages (black dots). Note that the dots 
represent PENS classes and not individual languages. 
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Table 2 

Observed PENS classes and properties of CNLs (sorted by PENS class) 


class 


P^EiN^S^ 

P^E^N'^S* 

P^E^N^S* 


P^E^N'^S^ 

P^E^N'^S^ 

P^E^N^S^ 

P^E^N^S^ 

p3e4n'‘s2 

Fe^n'^s^ 

FE^N'^S'' 

Fe^n^s^ 

Fe^n'^s^ 


FE^N'^S^ 

Fe^n^s^ 

pSEiN^S^ 

pSEiN^S^ 

pSEiN^sS 

P5e2n3s4 

P5e2n4s3 

P5e2n4s4 


P5e3n3s3 

P^E^N^S^ 


P^E'iN^S^ 


properties languages 
C T W I IBM's EasyEnglish 

C w S G Special English 

C w A E-Prime 

C w G Plain Language 

C S D G CAA Phraseology, PAA Phraseology ICAO Phraseology PoliceSpeak, SEASPEAK 

C w D I Airbus Warning Language 

F W A AIDA 

C T w D A I ALCOGRAM, COGRAM 
C T w D A CLCM 

C T w D I ASD-STE, Avaya CE, Bull GE, CTE, CASE, CE at Douglas, DCE, General Motors GE, PACE, Sun Proof 
C T w D Wycliffe Associates' EasyEnglish 
C T w I iCE, SMART Controlled English 

C w D I AECMA-SE, CEE, CASE, CE at Clark, CE at IBM, CE at Rockwell, EE, HELP, ILSAM, KISL, NCR EE 

C w D G Massachusetts Legislative Drafting Language 

C w I Boeing Technical English, NSE, SMART Plain English 

C w Basic English 

T W D I MCE, Oce Controlled English 

TWA KCE 

T W I CLOUT 

C F W D I SLANG 

F S D I Voice Actions 

F W D A RNLS 

F W A ClearTalk 

F W I ITA CE 

F W I CPL 

C F w I RuleSpeak, SBVR-SE 

F W D A Drafter Language, MILE Query Language 

F W A Quelo Controlled English 

T F D A PILLS Language 

F W D A Atomate Language 

F W A I Gellish English 

F W A GINO's Guided English 

F W I CELT 

F W D A PROSPER CE 

F W A ACE 

F W D A ICONOCLAST Language 
F W D A CLEP Query Language 
F W A Ginseng's Guided English 

F W D A Coral's Controlled English 

F W A PathQnt CNL 

F W A Sowa's syllogisms 

F W D A I TBNLS 

F W A OWLPath's Guided English, SQUALL 
F W A CPE, CLIP, OWL ACE, SOS 

F W D A BioQuery-CNL, PERMIS CNL, ucsCNL 
F W A CLOnE, DL-English, E2V, Lite Natural Language, OSE 
F W G Rabbit 

F W D A CLM, PorTheL, Naproche CNL 
F W A CLCE, PNL 

F W D A Gherkin 

F W A G RECON 

F W A First Order English, PENG, PENG-D, PENG Light 
F W I iLastic Controlled English 

F W A EE 


It is evident that the CNLs are widely scattered between the two extreme cases of 
natural English (white dot) and propositional logic (black dot in the corner). Seen from 
any angle, the set of existing CNLs exhibits wide variation. Except for the subspace 
with a naturalness level of less than 3, where there can be no CNLs by our definition. 
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Figure 1 

Visualization of the PENS dimensions of existing CNLs, as compared to natural languages 
(white dot) and common formal languages (black dots). Each dot represents a PENS class 
containing one or more languages. 


they cover a large part of the conceptual space. This indicates that PENS is a powerful 
scheme for distinguishing different CNLs. 

The diagrams also show that the CNL classes form one single cloud, from any 
perspective, and not two or more disconnected clouds. This means that it would be 
difficult to come up with a clean categorization scheme that would subdivide the large 
and diverse set of existing CNLs. This seems to justify the decision of using the term 
CNL in a broad sense and not replacing it by more specific terms. 

Lor several dimension pairs, strong correlations are observed. Precision and sim¬ 
plicity are positively correlated: More precise languages tend to be simpler (Spearman's 
rank correlation coefficient p = 0.90, using individual languages as data points and 
excluding the languages for comparison). Expressiveness and simplicity exhibit a strong 
negative correlation: More expressive languages tend to be more complex {p = —0.82). 
In addition, naturalness/expressiveness are strongly positively {p = 0.77) and natural¬ 
ness/simplicity strongly negatively correlated {p = —0.76). At a slightly lesser degree, 
negative correlation values are obtained for the pairs precision/naturalness {p = —0.67) 
and precision/expressiveness {p = —0.66). These observations seem to be in line with 
what one would intuitively expect. 

5.2 Properties 

Let us turn to the properties. Table |3] shows the number of CNLs for each of the 
properties we considered and their combinations. As some languages have been used 
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Table 3 

Properties of existing CNLs with average PENS values 


property 

total 

c 

combined with property 

T F W S D A I 

G 

PENS 
P E 

average 

N S 

C 

comprehensibility 

45 

- 

17 

3 

40 

6 

33 

4 

33 

8 

2.0 

4.3 

4.7 

1.2 

T 

translation 

22 

17 

- 

1 

21 

0 

17 

5 

18 

0 

2.0 

4.8 

5.0 

1.1 

F 

formal representation 

54 

3 

1 

- 

52 

1 

19 

45 

10 

2 

4.4 

2.3 

3.8 

3.2 

W 

written 

93 

40 

21 

52 

- 

1 

46 

49 

42 

5 

3.3 

3.5 

4.3 

2.3 

S 

spoken 

7 

6 

0 

1 

1 

- 

6 

0 

1 

6 

2.0 

1.6 

3.4 

1.9 

D 

domain-specific 

53 

33 

17 

19 

46 

6 

- 

20 

29 

6 

2.8 

3.5 

4.4 

1.9 

A 

academia 

50 

4 

5 

45 

49 

0 

20 

- 

4 

1 

4.3 

2.5 

3.9 

3.1 

I 

industry 

43 

33 

18 

10 

42 

1 

29 

4 

- 

0 

2.3 

4.3 

4.7 

1.4 

G 

government 

10 

8 

0 

2 

5 

6 

6 

1 

0 

- 

2.4 

2.5 

3.8 

2.0 


more extensively and over longer periods of time than others, these numbers do not 
necessarily reflect the actual importance or popularity of the different language t 5 rpes. 
The table also shows the average PENS values for each t 5 rpe. Again, we should be care¬ 
ful when interpreting these numbers, as all languages were equally weighted, which 
does not take into consideration that some languages are much more mature and wide¬ 
spread than others. Nevertheless, these numbers reveal some interesting facts. 

For a bit less than half of the languages, the goal is to increase comprehensibility. 
Formal representation is the goal of another, only slightly overlapping, half. About 22% 
of all languages have translatability as their goal. There is a large overlap of the t 5 rpes C 
and T, whereas these two barely overlap with F. Existing CNF approaches can therefore 
be roughly subdivided into two groups of similar size: one consisting of languages for 
improved comprehensibility and translatability, and the other made up of languages 
that have formal representation as their goal. Mostly, languages of the t 5 rpes C and T 
are domain-specific, originated from industry, and focus more on expressiveness and 
naturalness than on precision or simplicity. Languages of t 5 rpe F, in contrast, mostly 
have an academic origin and tend to have a much stronger focus on precision and 
simplicity at the cost of expressiveness and naturalness. 

When it comes to the distinction between written and spoken languages, we see a 
very one-sided picture: More than 90% of all languages are intended to be written; we 
found only seven languages that are intended to be spoken (one of which is intended to 
be spoken and written). The reason for this might be that controlling a spoken language 
is much more difficult in practice. Written texts can be revised and given to a language 
checker before publication, whereas spoken language typically lacks this two-stage 
process. It is an interesting fact that six out of the seven spoken languages originated 
from a governmental environment. On average, written languages have higher PENS 
values in all four dimensions. 

Concerning domain-specificity, the data are balanced. About half of the languages 
are designed for a specific and narrow domain. The other half follow a more general- 
purpose approach. Comprehensibility is the prevalent goal for domain-specific lan¬ 
guages, and they mostly originated from industry. No clear tendencies can be identified 
with respect to the PENS dimensions. 

Concerning the last three properties, the data show similar language counts for 
academic and industrial CNLs: 50 and 43 languages, respectively. On the other hand, 
only ten CNLs were found that originated from a governmental environment. It has to 
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be noted, however, that information about CNLs from industry is t 5 rpically much scarcer 
than about languages from academia or governments. It is therefore likely that most of 
the languages that escaped this survey (because of missing or hard-to-find information) 
are industrial ones. Such a bias might also be present in the case of some of the other 
properties discussed above. In any case, academia apparently focuses much more on 
languages for formal representation than for comprehensibility or translation, whereas 
industry seems to have an opposite focus. 

5.3 Design Decisions 

Apart from being a description of the current state of the art. Table |3] can be a valuable 
tool for making design decisions when creating a new CNL. In such a situation, the 
application enviromnent of the language to be defined is t 5 rpically fixed, buf nof yef 
fhe inherenf properfies of fhe language ifself. Those inherenf language properties are 
supposed fo be fixed only during fhe design process. Af fhe early design sfage, fhe fable 
above can be used fo check fhe level of previous work on CNLs for a given combination 
of enviromnenf properfies. If also delivers fhe PENS classes of a fypical CNL in fhis 
environmenf, which can be used fo guide fhe design process. 

For example, if you infend fo creafe a domain-specific, indusfrial CNL fo enhance 
comprehensibilify, fhe fable fells you fhaf fhe combinafion of fhese properfies is nof un¬ 
usual af all (at least pair-wise combinations). Furthermore, the table indicates that such a 
language t 5 rpically has a PENS class somewhere between P^E^N^S^ and P^E^N^S^. As a 
second example, somebody might want to design a CNL for speech franslation. A quick 
look af fhe fable reveals fhaf no such CNL has been reporfed so far, which indicafes 
fhaf a significanf amounf of original work is needed for fhe design of such a language. 
We also see fhaf a fypical spoken CNL is very differenf from a f 5 rpical language for 
franslation in ferms of expressiveness and nafuralness. This suggesfs two imporfanf 
design decisions: How expressive should fhe resulting language be, and how nafural? 

The fable can reveal such questions abouf design decisions, buf of course if can- 
nof answer fhem. Neverfheless, such information about existing approaches in similar 
problem domains and environments can be very valuable to focus fhe design efforf fo 
fhe crucial aspecfs. 

5.4 Timeline 

Since CNLs have been defined and used over many decades and have influenced each 
ofher, if is inferesfing fo draw fhe evolution of fhese languages on a timeline, as Figure|2] 
does. Each bar represenfs fhe "life" of a language, fhaf is, fhe period when fhe language 
was sfudied or used. For some languages, fhe year of "birfh" or "deafh" is unknown, 
which is indicafed by dashed bars fading in and ouf. The vertical lines show influences 
from ofher languages af fhe fime of birth (solid for reported influences; dashed for 
influences fhaf are nof reporfed buf seem probable). The colors of fhe bars represent 
the goals of fhe languages, as indicafed in fhe legend. 

The oldesf CNL, Basic English, is also fhe mosf influenfial one. If influenced CEE, 
and indirecfly ILSAM, bofh very influenfial languages in fheir own right. Altogether, 
more than 20 languages were directly or indirectly inspired by Basic English. Among 
the more recent languages, ACE is the most influential in terms of offspring languages. 

Looking for an overall fheme in fhe evolution of CNL, one can idenfify somefhing 
fhaf could be called fhree "eras": fhe general, fechnical, and logical eras. The general era 
lasfed until fhe lafe 1960s or early 1970s. Only a few languages were defined and used 
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Figure 2 

The timeline of the evolution of controlled English. 
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during this time, all of which were designed to improve human comprehension and to 
serve as general languages with no specific application domain or narrow community 
in mind. These languages survived in their small niche, when during the subsequent 
technical era that began in the early 1970s, CNLs were applied to technical documen¬ 
tation for improved human comprehension as well as improved machine translation. 
Again, this branch of languages did not disappear at the end of the era and continues to 
be used today, but a new t 5 rpe of CNL emerged. During the logical era that began in the 
mid 1990s, many CNLs were created with some sort of mapping to formal logic, which 
enabled not only automatic processing but actual automatic interpretation. These three 
eras partly correspond to the three goals introduced in Section 1231 The first CNLs were 
of type C, type T emerged in the technical era, and t 5 rpe F in the logical era. 

5.5 Evaluations 

Finally, we can turn to a crucial aspect that we have not yet discussed: Do CNLs actually 
achieve the goals they were designed for? A number of studies have been reported that 
evaluate the supposed advantages of these languages. The relevant research question 
obviously depends on the goal the language is supposed to achieve. In their most 
general forms, the research questions for the t 5 rpes C, T, and F can be stated as follows: 

C Does a CNL make communication among humans more precise and more 
effective? 

T Does a CNL reduce overall translation costs at a given level of quality? 

F Does a CNL make it easier for people to use and understand logic 

formalisms? 


Each of these general research questions can be broken down, and most studies target 
more specific questions. 

For type C, two studies on AECMA-SE showed that the use of controlled English 
significantly improves text comprehension, with a particularly large effect for complex 
texts and non-native speakers (Chervak, Drury, and Ouellette 1996 IShubert et al. 199^ . 
The results of other studies were similar but not significant dStewart 19981 . The lan¬ 
guage CLCM has been found to have a positive effect on reading comprehension 
for most groups of readers under certain circumstances such as stress situations 
jTemnikova 2012l l. 

Concerning type T, it has been reported that the use of the controlled language 
MCE for machine-assisted translation leads to a "five-to-one gain in translation time" 
dRuffrno 19821 . Similar results have been presented for the language PACE, with which 
post-editing of machine-assisted translation is "three or four times faster" than with¬ 
out (|Pym 1990|. It has been shown that the adherence to typical CNL rules im¬ 
proves post editing productivity and machine translation quality (lAikawa et al. 20071 
O'Brien and Roturier 20071 . For the language CLCM, it has been reported that CNL 
texts are easier to translate than uncontrolled ones dTemnikova and Orasan 20091 
ITemnikova 2012ll and that the time needed for post-editing is reduced on average by 
20% dTemnikova 20101120121 . 

Studies on type F can be subdivided into those that test the general usability of 
CNL tools and those that specifically evaluate the comprehensibility of fhe acfual lan¬ 
guages. Sfarfing wifh fhe usabilify sfudies, if has been shown for fhe language CLOnE 
fhaf ifs inf erf ace is more usable fhan a common ontology edifor dFunk ef al. 2007t . 
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Similarly, Coral's controlled English has been shown to be easier to use than a com¬ 
parable common query interface dKuhn and Hofler 20121 . Positive usability results for 
CNL fools have also been reporfed for GINO dBernsfein and Kaufmarm 20061 , CLEF 


(Halleff, Scoff, and Power 20^, CPL dClark ef al. 2007\ . PERMIS ( Inglesanf ef al. 2008[ l, 
Rabbif dPimifrova ef al. 2008r and ACE dKuhn 20091 . Turning fo fhe comprehensibil- 


ify sfudies, if has been shown for fhe CLEF query language fhaf common users are 
able fo correcfly inferpref given sfafemenfs ( jHalleff, Scoff, and Power 2007) . ACE has 
been shown fo be easier and fasfer fo undersfand fhan a common ontology nofa- 
fion dKuhn 2013L whereas experimenfs on fhe Rabbif language gave mixed resulfs 
( |Harf, Johnson, and Dolbear 200^ . 

In addifion fo fhese high-level evaluations, more specific fesfs have 
been reporfed such as evaluations on coverage dBernsfein ef al. 20061 
Kaljurand 2007|, performance, convergence dAdriaens and Macken 1995L parseabilify 
([Wojcik, Harrison, and Bremer 1993|, compufafional complexity dPratt-Hartmarm 20031 
[iTiome and Calvanese 2010) , text complexity, and text length dTemnikova 2012} . 

In general, there seems to be good evidence for each of fhe language f 5 q)es fhaf fhe 
use of CNL can be advanfageous. This does nof mean, of course, fhaf CNL approaches 
always perform better. This depends heavily on fhe precise problem domain, fhe back¬ 
ground of fhe users, and — perhaps mosf imporfanfly — fhe qualify of fhe design of fhe 
language and ifs supporting fools. 


6. Conclusions 


To conclude, we can come back fo fhe aims sef ouf in fhe infroducfion of fhis arficle. 
The firsf goal was fo gef a beffer fheorefical undersfanding of fhe nafure of confrolled 
languages. Firsf of all, fhis arficle shows fhaf despife fhe wide variefy of exisfing CNLs, 
fhey can be covered by a single definition. The criteria of fhe proposed definition include 
virfually all languages fhaf have been called CNLs in fhe liferafure. We could show 
fhaf fhese languages form a widely scattered buf cormecfed cloud in fhe concepfual 
space befween nafural languages on fhe one end and formal languages on fhe ofher. 
The informal sfafemenf fhaf CNLs are more formal fhan nafural languages buf more 
nafural fhan formal ones is subsfanfiafed and verified. 

The nexf goal was fo esfablish a common terminology and a common model. We 
emphasized fhe difference befween characferisfics of fhe environments of languages on 
fhe one hand and fhe properties of fhe languages themselves on fhe ofher. Bofh aspecfs 
are imporfanf, buf fhe second is more difficulf fo capfure in a quanfifafive way. Nine 
general properties have been collected fo describe fhe application environmenfs of 
CNLs. As a novel addifion fo fhis model, we proposed fhe four-dimensional PENS 
scheme fo describe inherenf language properties. This scheme allows for classification 
of CNLs on a discrefe scale on fhe dimensions of precision, expressiveness, nafuralness, 
and simplicify. Togefher, fhis allows us fo formally model fhe imporfanf properties of 
languages and fheir environmenfs in a simple way, and fo puf order and sfrucfure fo a 
previously fuzzy and discormecfed field. 

The fhird goal was fo provide a sfarfing poinf for researchers inferesfed in CNL. 
The mosf imporfanf conclusion in fhis respecf is fhe facf fhaf many more CNLs exisf 
fhan have been found in any previous survey. Previously, fhe mosf comprehensive 
overview counted 41 CNLs jPool 2006t based on various nafural languages, whereas 
fhis survey covers 100 languages for English alone. The diversify of languages and 
fhe differenf environmenfs in which fhey were sfudied and used apparenfly had fhe 
consequence fhaf many CNL researchers and developers were nof aware of a large 
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number of relevant languages. As a starting point for researchers, this work presents 
a diverse sample of twelve important and influential languages, along with a long list 
of all CNLs collected. The introduced model of languages and environments can also 
facilitate the identification of a particular research focus and the collection of relevant 
prior work. 

The fourth goal was to help CNL developers make design decisions. To that aim, 
the data of this survey can be used to direct developers to existing CNL approaches 
in a given environment and problem domain. The data can reveal whether a certain 
kind of CNL usage is common, rare, or inexistent until now, which can be used as an 
indication of the amount of original work required. Furthermore, the typical language 
properties of CNLs in terms of precision, expressiveness, naturalness, and simplicity 
can be retrieved for a given usage scenario. This information might be very useful to 
identify important design decisions and to find existing approaches to build upon. 

I would like to conclude with the observation that the study of controlled languages 
is a very d 5 mamic and highly interdisciplinary field, for the most part occupying small 
niches in the academic, industrial, and governmental worlds. However, adding all these 
niches together gives us a large body of past and ongoing work. Assuming that people 
will have to interact even more closely with computers and across language borders in 
the future, I am convinced that we will see even much more work in this area. 

Appendix A: Full List of English-based Controlled Natural Languages 

Below is the full list of 100 English-based CNLs in alphabetical order. See Section |4] for 
the details of this collection. 

AECMA Simplified English (AECMA-SE) jAECMA 19861 was the predecessor of ASD 
Simplified Technical English. See Section Wl] — P^E^N^S^, C w D I 

AIDA jKuhn et al. 2013) is a CNL to allow for informal and underspecified representa¬ 
tions of scientific assertions in an approach for semantic publishing called "nanopubli¬ 
cations." Single English sentences are used as a scaffold for underspecified representa¬ 
tions and for the inclusion of informal statements in formal RDE-based structures. These 
sentences are Atomic, Independent, Declarative, and Absolute (hence the name AIDA). 
This is an example: 

The degree of hepatic reticuloendothelial function impairment does not differ between 
cirrhotic patients with and without previous history of SBP. 

— P^E^N^Sl, F W A 

"Airbus Warning Language" (Spaggiari, Beaujard, and Cannesson 2003 ) is a language 
for short industrial warnings, focusing on abbreviations and restricting the word order. 
This is an exemplary statement: 

ENGl REV NOT LOCKED 

— P^E^N^S^, c w D I 

ALCOGRAM (Adriaens and Schreors 1992) is a CNL developed at Alcatel. It originated 
from COGRAM as an "algorithmic variant," focusing on the use within a computer- 
aided language learning tool. In contrast to COGRAM, which consists of three compo¬ 
nents that declaratively define the language, ALCOGRAM is defined based on a four- 
staged algorithm. Each of these four stages checks certain aspects: preparatory textual 
control (e.g., "Define technical terms and acronyms in advance"), syntactic control (e.g., 
"Write one instruction per sentence for single actions"), lexical control (e.g., "Avoid 
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gender-specific language"), and micro control (e.g., "Use words for a number when it is 
the first word in the sentence"). These are two examples of ALCOGRAM sentences: 

Set the switch to the middle. Press the button on your right. 

When the test circuit is called, a test tone with the proper transmit level is returned. 

— C T W D A I 


ASD Simplified Technical English (ASD-STE). See Section \4J] — P^E^N^S^, C T w D I 

"Atomate Language" jVan Kleek et al. 2010b is part of the Atomate interface, which lets 
users define simple automatic tasks and reminders taking context and current activity 
into account. The language was inspired by CLOnE, ACE, and the GINO and Ginseng 
systems. This is an example of such a task definition: 

Alert me when my location is home on/after Tuesdays at 5pm with the message: Trash 
day! 

A special editor supports users in writing such sentences, using a mixture of predictive 
editing and conceptual authoring. The sentences are mapped to RDE and automatically 
triggered when the preconditions are met. — P^E^N^S^, F w D A 

Attempto Controlled English (ACE). See Section \4J] — P^E^N^S^, F w A 


Avaya Controlled English ( Avaya 2004| is a language for technical publications in the 
telecommunication and computing industry. Its use should reduce translation costs and 
should make texts easier to understand for human readers. It puts restrictions on the 
lexicon (e.g., "Do not use abort"), grammar (e.g., "Use active voice"), semantics (e.g., 
"Use may only to grant permission"), and style (e.g., "Put command names in bold 
monospaced type"). An open list of about 250 words defines preferred terminology for 
the given computer and telephony domain, and clarifies usage and meaning of these 
words. These are two examples of sentences: 

This procedure describes how to connect a dual ACD link to the server. 

If the primary server fails, you can use the secondary server. 

— P^E®N®S\ C T W D I 


Basic English. See Section \4J] — P^E®N®S^, C w 

BioQuery-CNL (Erdem and Yeniterzi 200^ is a language for biomedical queries. It 
serves as an interface language for a query engine based on answer set programming. 
BioQuery-CNL was initially designed as a subset of ACE with some small modifications 
handled in a preprocessing step. The ACE parser was used for processing the language. 
In later versions, however, the language diverged from ACE and evolved into an 
independent language with its own parser. This is an exemplary query: 

What are the genes that are targeted by all the drugs that belong to the category 
Hmg-coa reductase? 

— P5E2N4S^ F W D A 


Boeing Technical English (Wojcik, Holmback, and Hoard 1998|l was an extension of 
AECMA Simplified English to improve readability and consistency of documents, with 
the specific goal to broaden the scope beyond the aviation domain. The language seems 
to have been discontinued and apparently never came to be deployed at Boeing. — 
P^ESn^SI, C w I 


Bull Global English (Smart Communications Inc. 1994) or Bull Controlled English 
is a language developed at Croupe Bull, a Erench computer company. It was proba¬ 
bly influenced by SMART Plain English. Bull Global English can be summarized by 
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the following ten rules ([Karkaletsis and Spyropoulos 1997f, which have a considerable 
overlap with the rules of Caterpillar Fundamental English: 


1. Make positive statements. 

2. Keep sentence length 21 words. 

3. Avoid false nomenclature. 

4. One thought per sentence. 

5. Use simple sentence structures. 


6. Use active voice and parallel construction. 

7. Avoid conditional tenses. 

8. Avoid abbreviations and colloquialisms. 

9. Use correct punctuation. 

10. Use standardised nomenclature. 


— C T W D I 


CAA Phraseology dCAA 20111 is a language for air traffic control introduced by the 
Civil Aviation Authority (CAA) in the 1980s or possibly earlier. It is very similar to the 
phraseologies by EAA and ICAO. — P^E^N^S^, C S D G 

Caterpillar Fundamental English (CFE). See Section \4l\ — P^E^N^S^, C w D I 

Caterpillar Technical English (CTE) (|Hayes, Maxwell, and Schmandt 1996 
Kamprath et al. 1998| is the second CNL developed at Caterpillar. Its development 
started in 1991, that is, almost a decade after the discontinuation of CEE. Apart from 
improving consistency and reducing ambiguity of technical documentation, the goal of 
CTE was to improve translation quality and reduce translation costs with the help of 
machine translation. This is an example of a CTE text: 

This category indicates that an alternator is malfunctioning. If the indicator comes on, 

drive the machine to a convenient stopping place. Investigate the cause and determine 

the solution. 

In contrast to CEE, texts in CTE are supposed to be translated before given to persormel 
in non-English speaking countries. As a further difference, CTE was designed to be 
an "enforceable controlled English" that comes with an authoring tool that enforces 
the compliance with the restrictions. The CTE lexicon consists of about 70,000 terms 
with a "narrow semantic scope" (compared to CEE's less than 1,000 terms with a 
broader semantic scope). The S 5 mtax is restricted too, including restrictions on the use of 
conjunctions, pronouns, and subordinate clauses. CTE comes with a language checker 
that allows for interactive disambiguation on the lexical level, enriches the technical 
texts with SGML armotations, and uses the S 5 mtax analyzer of the KANT system (see 
KANT Controlled English below). — P^E®N®S^, C T w D I 

Clear And Simple English (CASE) ( |Pym 1990) was a controlled English introduced 
in the 1980s at the J.I. Case Company, a manufacturer of construction and agricultural 
equipment. It descended from CFE. — P^E^N^S^, C w D I 

ClearTalk jSkuce 200311 is a CNL for the Semantic Web first presented in the 1990s. Its 
creator claims that documents in ClearTalk can be "almost automatically" translated 
into a formal logic notation and into other natural languages. It "offers a flexible degree 
of formality" that lets an author choose to "leave or remove ambiguity." It has been 
used to encode more than 25,000 facts in different technical domains. ClearTalk is 
heavily restricted on the S 5 mtactic level (e.g., basic sentences have the general form 
subject predicate complement modifier-phrases) as well as on the semantic one (e.g., the 
determiner a at subject position represents universal quantification). These restrictions 
are expressed in a large number of rules. Two examples of sentences are shown here: 

Any adverb that modifies a verb must be adjacent to (that verb or another adverb). 

Mary hopes that [- Bill loves her -]. 
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ClearTalk can itself be described in ClearTalk; the first example is from this self¬ 
description. Different forms of parentheses are used to disambiguate different kinds 
of scopes. — P^E^N^S^, F w A 

"CLEF Query Language" dHallett, Scott, and Power 2007| | is a language used within 
a system called CLEF (Clinical E-Science Framework), which should help clinicians, 
medical researchers, and hospital administrators to query electronic health records. The 
language was influenced by the Drafter language. Basic queries are composed of three 
elements: the set of relevant patients, the received treatments, and the outcomes. This is 
an example: 

For all patients with cancer of the pancreas, what is the percentage alive at five years for 
those who had a course of gemcitabine? 

Complex queries can have multiple elements of the same type. The system employs a 
conceptual authoring approach for writing queries, which are then translated in several 
steps to SQL and given to a database engine. — P^E^N^S^, F w D A 

COGRAM dAdriaens and Schreors 1992| | was a controlled language developed in the 
late 1980s for the telecommunication domain (at Alcatel). It was developed as a response 
to the finding that the existing controlled languages AECMA Simplified English, Eric¬ 
sson English, and IBM's controlled English were "incomplete and defective in many 
ways." COGRAM consists of a vocabulary of approximately 5,000 words plus another 
1,000 technical terms, and a grammar with about 150 rules. These rules fall into three 
categories: "Do not use X," "Use only X," and "Avoid (try not to use) X." Grammar 
rules of the last type can be seen as style-guides that do not restrict the coverage of 
the language. The language definition is divided into three components: lexical (e.g., 
"Use short infinitives of regular action verbs"), S 5 mtactic (e.g., "Do not use a participle 
to introduce an adverbial clause"), and stylistic (e.g., "Expound major topics, restrict 
minor topics"). The definition of COGRAM was found to be "not the most motivating 
of texts for technical writers to use in the writing process," which led to the development 
of ALCOGRAM. — P^ESn^SI, c t w d a i 

Common Logic Controlled English (CLCE) dSowa 20041 is a language that can be 
translated into first-order logic with equality in the form of the Conceptual Graph 
Interchange Format (CGIF). It is defined by a grammar in Backus-Naur form "that 
allows every ambiguity to be resolved when a sentence is parsed." Some of the most 
important S 5 mtax restrictions are: no plural nouns, only present tense, and variables 
instead of pronouns. For an unambiguous mapping to logic, a number of interpretation 
rules are applied and parentheses are used to determine the structure of deeply nested 
sentences. Sentences in this language should be similar to those found in software 
documentation and textbooks of mathematics, for example: 

If some person x is the mother of a person y, then the person y is a child of the person x. 

Declare give as verb (agent gives recipient theme) (agent gives theme to recipient) 

(theme is given recipient by agent) (theme is given to recipient by agent) (recipient is 
given theme by agent). 

Imperative sentences, as the second example, are used to import or declare words. 
Names, nouns, verbs, adjectives, adverbs, and prepositions can be declared in this way. 
— P^E^N^S^ F w A 

Computer Processable English (CPE) dPulman 19961 ISukkarieh and Pulman 1999| | is a 
controlled language that can be "completely S5mtactically and semantically analysed." 
An early version of the language used KIF (Knowledge Interchange Format) as its 
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logic formalism, whereas McLogic was used later on. The language comes with a 
bidirectional grammar implemented as a Prolog unification grammar. Two examples 
are shown here: 

Every animal X eats some animal that is smaller than X. 

Every registered user who has borrowed less than ten copies can borrow every 
available copy. 

The mapping to logic seems to be deterministic, even though the available literature is 
not explicit about this. — P^E^N^S^, F w A 

Computer Processable Language (CPL) dClark et al. 2005ll is a controlled variant of 
English developed at Boeing. It is very different from earlier CNL approaches where 
Boeing was involved in, such as ASD-STE and Boeing Technical English. CPL is much 
more restricted than these earlier approaches and sacrifices to some degree expressive¬ 
ness and naturalness for the sake of automated reasoning support. Basic CPL sentences 
are restricted to the pattern subject + verb + complements + adjuncts. There are further 
restrictions on the S 5 mtax, for example that definite references have to be used instead 
of pronouns. Statements involving universal quantification are constructed from seven 
templates such as "If sentencel then typically sentence2," where sentencel and sentence! 
are basic CPL sentences of the structure introduced above and where typically is a 
reliability degree: one of (almost) always, usually, sometimes, and never. These are two 
examples of CPL sentences: 

IF a person is carrying an entity that is inside a room THEN (almost) always the person 
is in the room. 

AFTER a person closes a barrier, (almost) always the barrier is shut. 

A parser translates CPL sentences into a frame-based language with well-defined 
semantics. In contrast to most other logic-based CNL approaches with custom-built 
parsers, the parsing process of CPL involves different external tools and resources. An 
existing parser for unresfricfed English is used fo generafe an infermediary logical form. 
Then, WordNef and ofher resources are used fo make a "besf guess." The resulfing 
logical represenfation is fhen paraphrased and shown fo fhe user for verification or 
correction. — P^E^N^S^, F w I 


Controlled Automotive Service Language (CASL) dMeans and Godden 19961 
Means, Chapman, and Liu 20001 is a controlled language for wrifing service manuals 
and bullefins af General Motors developed in fhe 1990s. The goal was fo improve 
franslafabilify as well as consistency and readabilify. The approach moved from 
an "aufhor-cenfric model" towards a "hybrid model" fhaf included fhe role of an 
editor, before if wenf fo full producfion in 2000 dGodden 2000L The CASL resfricfions 
are defined by 62 rules, including resfricfions on sentence sfrucfure, word order, 
vocabulary, and puncfuation. This is an exemplary sentence: 


Several diseases result from asbestos exposure, with latency periods of 10 to 40 years or 
longer. 

Writers are supported by a software tool called CASLChecker. — P^E^N^S^, C T w D I 


"Controlled English at Clark" dAdriaens and Schreors 19921 was a language used at 
the Clark Material Handling Company. It was developed around the late 1980s and was 
influenced by SMART Plain English. — P^E^N^S^, C w D I 


"Controlled English at Douglas" dKleinman 198211 was a language developed in 1979 
by the McDormell Douglas aerospace company for fheir technical manuals. If was based 
on a dicfionary of abouf 2,000 words (mosf of fhem verbs), favoring shorf and simple 
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words and aiming at a single word per meaning and a single meaning per word. In 
addition to the words of the dictionary, "nomenclature words" can be introduced. The 
goal was to improve readability, translatability, and standardization. It was probably 
influenced by CFE and had ifself an influence on AECMA SE. — P^E^N^S^, C T w D I 

"Controlled English at IBM" dAdriaens and Schreors 1992\ was a language developed 
and used at IBM in the late 1980s. It was influenced by ILSAM and mighf have influ¬ 
enced EasyEnglish, which was also developed af IBM several years lafer. If relied on 
a closed lisf of words, and wrifers were assisfed by differenf insfrucfion programs. — 
P^E^N^Si, c w D I 

"Controlled English at Rockwell" dAdriaens and Schreors 1992| | was a language used 
at the company Rockwell International. It was developed around the late 1980s and was 
influenced by SMART Plain English. — P^E^N^S^, C w D I 

Controlled English to Logic Translation (CELT) dPease and Li 20i0ll is a controlled 
natural language presented in 2003. It is a domain-independent language inspired by 
ACE. In contrast to ACE, it uses existing linguistic and ontological resources, concretely 
the SUMO ontology and WordNet. These are two exemplary sentences: 

Dickens writes Oliver Twist in 1837. 

Every boy likes fudge. 

The S 5 mfax sfrucfure of CELT senfences is deferminisfically parsed. Heurisfics are 
applied only afferwards fo map fhe words fo SUMO and WordNef. The language is 
implemenfed as a unificafion grammar in Prolog. — P^E^N^S^, F w I 

Controlled Language for Crisis Management (CLCM) dTenmikova 20101120111120121 
is a language for writing insfrucfions abouf how fo deal wifh crisis sifuafions. The 
language is defined by abouf 80 simplificafion rules. These simplification rules include 
resfricfions on fexf sfrucfure (e.g., "Wrife a fifle for every specific sifuafion"), formaf- 
fing (e.g., "Separafe wifh a new line each block of insfrucfions"), lexicon (e.g., "avoid 
fechnical ferms"), S 5 mfax (e.g., "Avoid passive voice"), semanfics (e.g., "Use only liferal 
meaning"), and pragmatics (e.g., "Remove unimportant information"). — P^E^N^S^, 
C T W D A 

Controlled Language for Inference Purposes (CLIP) dSukkarieh 2003) is a language 
based on the logic notation McLogic and influenced by CPE. If is "semanfically-driven," 
meaning fhaf if was designed around fhe given logic formalism and nof vice versa. Two 
examples are shown here: 

Every student who laughs succeeds 
Smith and Jones sign five contracts 

— P^E^N'^S^ F w A 

Controlled Language for Ontology Editing (CLOnE) dEunk et al. 2007) , previously 
called CEIL Controlled Language, is a CNL designed as a front-end language for OWL, 
covering only a small subsef of if. If is defined by fen basic senfence patterns. If adds 
procedural semanfics on fop of OWL fo infroduce and remove entifies and axioms. 
These are two examples of CLOnE senfences: 

Persons are authors of documents. 

Forget everything. 

— P5E2NS^ F W A 
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Controlled Language Optimized for Uniform Translation (CLOUT) (Muegge 2007| 
is a CNL to improve machine translation. It puts restrictions on the vocabulary and 
prohibits structures such as passive voice and pronouns. — P^E®N®S^, T w I 


Controlled Language of Mathematics (CLM) (Humayoun and Raff alii 2010 is a lan¬ 
guage for expressing mathematical texts, as found in textbooks. The language is similar 
to Naproche CNL and ForTheL. The grammar of CLM is implemented in GF (Gram¬ 
matical Framework) and allows for deterministic translation into first-order logic. The 
goal is to automatically verify mathematical proofs. — P^E^N^S^, F w D A 


Coral's Controlled English dKuhn and Hofler 2012) is a controlled language for ex¬ 
pressing formal queries to annotated text corpora. It is influenced by ACE, but is much 
less expressive, simpler and more domain-specific. It is embedded into a query interface 
called Coral to enable users with no particular background in computer science to 
effectively use large corpora of annotated texts. This is an exemplary query: 

Find all passages where a noun phrase contains a verb phrase; the verb phrase precedes 
a prepositional phrase; the prepositional phrase contains a verb "see"; 

Such queries are deterministically mapped to AQL, an existing formal query language. 
The language is defined by 51 simple grammar rules. — P^E^N^S^, F w D A 


Diehold Controlled English (DCE) ([Haves, Maxwell, and Schmandt 1996[ IMoore 20001 
is a controlled language developed at Diehold with the goal to make translation faster 
and less expensive by assisting human translators with specific translation tools. It was 
inspired by GTE, but is less strict concerning lexicon and grammar, making the approach 
more flexible. It consists of three main components: a lexical database, a set of grammar 
rules, and a checking tool. — P^E®N®S^, C T w D I 


DL-English (Thorne and Calvanese 2010) is a Description Logic based controlled lan¬ 
guage presented together with other similar languages to study and compare their 
computational complexity. It is similar to Lite Natural Language by the same research 
group. — P^E^N^S^, F w A 


"Drafter Language." See Section Wl] — P^E^N^S^, F w D A 


E-Prime or E'. See Section Wl\ — P^E^N^S^, C w A 


E2V. See Section \4J\ — P^E^N^S^, F w A 


EasyEnglish (by IBM) (Bernth 199711 , not to be confused with Wycliffe Associates' 
EasyEnglish, is a language developed at IBM, which might have been influenced by 
an earlier controlled English at the same company (Adriaens and Schreors 1992| |. The 
main goal of EasyEnglish was to improve machine translation. The approach is based 
on a sophisticated grammar checker that returns suggestions and warnings. Apart 
from detecting common grammar errors, the system can enforce the use of a certain 
controlled vocabulary and can spot ambiguities. For such ambiguities, the system can 
propose alternatives, but it is ultimately up to the user whether to follow the system's 
suggestions or not. The problems encountered in a given document are quantified in 
the form of a clarity index, which must be above a certain threshold value. The fact 
that the restrictions of the language are not enforced but just suggested does not make 
the language more precise or simpler than full natural English. EasyEnglish has been 
extended later to check not only on the sentence level but also on the document level, 
and this has been implemented in a tool called EasyEnglishAnalyzer (Bernth 2006L — 
piE^N^Si, c T w I 
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EasyEnglish (by Wycliffe Associates) dBetts 200311 , not to be confused with IBM's 
Easy English, is a controlled language used for franscribing biblical fexfs. The original 
goal was fo improve fhe franslafion process info ofher languages, buf EasyEnglish is also 
direcfly used by readers wifh limifed knowledge of English. The language is resfricfed 
wifh respecf fo lexicon, S 5 mfax, and semanfics. There are two levels: Level A makes use 
of abouf 1,200 words, whereas level B has a larger lexicon of abouf 2,800 words. In eifher 
case, fhe meaning of fhese words is resfricfed. Eor example,/ah can only mean unbiased, 
and to see cannof be used in fhe sense to meet. If is possible fo use words fhaf are nof 
on fhe lisf, if fhey are explained in separafe EasyEnglish senfences. The following is an 
excerpf of a fexf in EasyEnglish {moor is nof in fhe lexicon and has fo be explained): 

The Highlands of Scotland consist of lakes, mountains and moors. The moors are flat 
empty lands where no trees grow. This land is wonderful and magnificent because it is 
so empty. 

There is a strict sentence length limit of 20 words, and paragraphs may not contain more 
than 150 words. Sentence structure is kept simple by allowing not more than two finite 
clauses and not more than two prepositional phrases per sentence. Eurthermore, deep 
nesting and passives are restricted. In addition, texts should adhere to logical simplicity: 
"EasyEnglish writers are encouraged to identify the basic idea units in a complex 
sentence or paragraph and arrange them in logical order." — P^E^N^S^, C T w D 

Ericsson English (EE) (I Adriaens and Schreors 199^ was a language developed at Eric¬ 
sson in the early 1980s, influenced by ILSAM. It is built on a closed list of acceptable 
words, but other words can be introduced if accompanied by a definition in EE. — 
P^E^N^Si, c w D I 

EAA Air Traffic Control Phraseology. See Section\£^ — P^piN^S^, c S D G 

First Order English (Pool 200611 is a controlled natural language that maps to first-order 
logic. No detailed description of this language is available. — P^E^N^S^, F w A 

Formalized-English (EE). See Section Wl\ — P^E^N^S^, F w A 

ForTheL (Vershinin and Paskevich 2000) is a CNL for mathematical texts similar to 
Naproche CNL and CLM. The name stands for "Eormal Theory Language." Statements 
in this language can be automatically translated into first-order logic with equality. The 
following is an exemplary text: 

Lemma 1. Each set has a subset. 

Proof. 0 is a subset of all sets. QED. 

— P^E^N^S^, F W D A 

Gellish English (van Renssen 20()5t is a controlled language designed as a common 
data language for industry. The first version was ready in 1998. Basically, it consists of 
simple subject-predicate-object structures with predefined relations in the form of fixed 
phrases such as "is a specialization of" and "is valid in the context of." These are two 
examples: 

collection C each of which elements is a specialization of animal 

the Eiffel tower has aspect hi 
hi is classified as a height 
hi is qualified as 300 m 

Meta-information about the context of such statements can be expressed in the form of 
additional "accessory facts." Gellish builds upon a fixed upper ontology with a large 
number of predefined concepts and relation t 5 rpes. Texts in Gellish can be transformed 
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into a formal tabular representation. The semantics of the language is not fully formal¬ 
ized, which means that there is no mapping to an established logic formalism. Gellish 
support simple kinds of if-then rules (Ivan Renssen 20lT]| , but these rules do not allow 
for universal quantification over several variables in a general way. — P^E^N^S^, F w AI 


General Motors Global English ( Means, Chapman, and Liu 2000) or just Global En¬ 
glish is a controlled language developed at General Motors. The goal was to improve 
comprehension for non-nafive speakers and franslafabilify. If is defined by 15 rules 
based on four principles: "be brief," "be clear," "be direcf," and "be culfurally alerf." 
These rules include a limif on fhe senfence lengfh and grammatical resfricfions such 
as fhe exclusion of passive voice. The language evolved from a reduced sef of fwelve 
of fhe 62 rules of fhe CAST language, which was developed af General Motors several 
years earlier. In confrasf fo CAST, Global English does nof come wifh a software tool for 
checking the compliance with the restrictions. — P^E^N^S^, C T w D I 


Gherkin dNecas 20111 is a language for writing execufable scenarios for soffware speci¬ 
fications. This is an excerpf of a scenario descripfion: 

Scenario: Unsuccessful registration due to full course 
Given I am a student 

And a lecture "PA042" with limited capacity of 20 students 
But the capacity of this course is full 
[...] 


The structuring words such as Given, And, and But are fixed. The resfricfions on fhe 
remaining fexf such as "I am a sfudenf" are implemented in ordinary programming 
languages using regular expressions, and are stored in small modules called "step 
definitions." The concrete step definitions are not part of Gherkin, buf have fo be 
implemenfed for fhe particular fask af hand. Gherkin is fherefore highly customizable 
and extensible, and fhe classification given here is meanf fo apply fo a f 5 rpical concrefe 
language fhaf is based on Gherkin. — P^E^N^S^, F w D A 

"GINO's Guided English" dBemsfein and Kaufmarm 2006l l is a language used in 
GINO, a system fo query and edif onfologies. The language was influenced by Ginseng 
and supporfs fhe same kinds of queries. In addition, GINO has some limifed supporf 
for procedural sfafemenfs fo infroduce new enfifies, for insfance: 

There is a subclass of class water area named lake. 

Query statements are mapped to SPARQL and procedural statements map to OWL 
axioms to be added or modified. Queries can exhibit structural ambiguity, in which 
case the system evaluates all possible interpretations and shows to the user the union of 
their answers. The grammar that describes the language consists of 120 grammar rules. 
— P^E^N^S^, F w A 

"Ginseng's Guided English" dBemsfein et al. 200611 is a CNL used in a system called 
Ginseng, which is a query interface to access knowledge bases in the form of OWL 
ontologies. The vocabulary for the language is loaded from the respective ontologies. 
These are two examples of queries: 

What are the capitals of states that border Nevada? 

Is there a city that is the highest point of a state? 

The grammar consists of 120 static grammar rules plus additional d 3 mamic rules gener¬ 
ated from the ontologies. — P^E^N^S^, F w A 
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Hyster Easy Language Program (HELP) dSmart 2003) is a controlled English developed 
in the 1980s for maintenance manuals for liff frucks. If is based on SMART Plain English 
and fhus indirecfly on CEE (|Pym 1990 1. — P^E^N^S^, C w D I 


ICAO Phraseology dEuroconfrol 200^ is confrolled language for air fraffic confrol de¬ 
fined by fhe Infernafional Civil Aviafion Organisafion (ICAO) in fhe 1980s or even 
earlier. If is very similar fo fhe phraseologies by EAA and CAA. — P^E^N^S^, C S D G 


"ICONOCLAST Language" dPower 199911 is a CNL fo wrife patienf informafion 
leaflefs. If is similar fo fhe Draffer language. A concepfual aufhoring approach is em¬ 
ployed and a formal logic represenfafion is used in fhe background. This is a simple 
example: 

If you develop a rash, you should consult your doctor. 

— P'^E^N^S^ F W D A 

iHelp Controlled English (iCEj^ is a language developed by iHelp Ltd, a documenta¬ 
tion consultancy company. iCE consists of "a sef of flexible rules and vocabularies for 
companies wishing fo sfandardise and improve fheir informafion." — P^E^N^S^, C T W I 


iLastic Controlled English diLastic 2012) is a language to allow non-developers to write 
intuitive and natural scripts that automatically retrieve, transform, and combine dafa 
from fhe web, dafabases, files, and ofher resources. This is an exemplary sfafemenf: 

delete all files under the tmp folder if the space of the disk is lower than 1024. 

— P^E^N^S^, F W I 


International Language of Service and Maintenance (ILSAM) (Pym 1990) is an influ¬ 
ential language similar to Caterpillar Eundamental English, from which if was derived 
in the 1970s. — P^E^N^S^, C w D I 


ITA Controlled English (ITA CL) dMott 2010) is a controlled language defined by fhe 
Infernafional Technology Alliance, a US/UK milifary research program. If is inspired 
by CLCE, buf is less sfricf in ferms of precision: If has an "informal meaning and a semi- 
formal mapping fo predicafe logic." The following are two examples of sfafemenfs of 
differenf fypes: 

if (the person X has the person Y as brother ) and (the person Z has the person X as 
father ) then (the person Z has the person Y as uncle ). 

"the plan has failed" because "there was a misunderstanding". 

The first example shows a "logical rule"; the second example is a "rationale" statement. 
Parentheses and variables are used to disambiguate. Around 90 grammar rules define 
the language. — P^E^N^S^, F w I 


KANT Controlled English (KCE) i Mitamura and Nyberg 1995} is a controlled natural 
language for machine translation used within the KANT translation system. The lan¬ 
guage was first presented under this name in 1995, but it had at that point already 
been studied and used for several years. The focus is on technical documents, and KCE 
was the basis for the development of Caterpillar Technical English. Lexicon, grammar, 
and semantics are restricted. In addition, ambiguities are resolved interactively by aug¬ 
menting the input sentences with SGML tags. In the following sentence, for example, 
the attachment of the preposition "with twelve rivets" is ambiguous: 

Secure the gear with twelve rivets. 


5 http://www.1indy-hop.co.uk/iHelp/ice/ 
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In KCE, this ambiguity can be resolved by augmenting the sentence with an SGML tag, 
for instance "Secure the gear with <attach head='secure' modi='with'> twelve rivets." 
For the classification of fhe language, fhe quesfion arises whefher fhe SGML fags are parf 
of fhe language or jusf a mefhod fo keep hack of decisions concerning ambiguifies. The 
SGML fags posifively confribufe fo fhe precision of fhe language buf heavily impede ifs 
nafuralness. Since such markup fags are usually hidden and since KCE fexfs are inifially 
wriffen wifhouf fags, which are added only afferwards, we consider fhem a parf of fhe 
KANT mefhodology buf nof of fhe confrolled language ifself. — P^E^N^S^, TWA 


Kodak International Service Language (KISL) is a CNL developed 
at Kodak in the early 1980s. Some see it as a descendant of CEE 
([Spaggiari, Beaujard, and Carmesson 2003). — P^E^N^S^, C w D I 


Lite Natural Language ( |Bernardi, Calvanese, and Thorne 2007 1 is a CNL based on the 
language E2V and its variants. It has a deterministic mapping to DL-Lite, which is a 
logical formalism opfimized for good compufafional properfies and is equivalenf fo a 
subsef of OWL. — p5E2N^S^ F w A 


"Massachusetts Legislative Drafting Language" dMassachusetts Senate 2003) is a re¬ 
stricted language for legal fexfs defined by fhe Massachuseffs Senafe. Ifs purpose is "fo 
promofe uniformify in draffing sfyle, and fo make fhe resulfing sfafufes clear, simple 
and easy fo undersfand and use." The language is defined by abouf 100 rules fhaf 
resfricf S 5 mfax (e.g., "Use fhe presenf fense and fhe indicative mood"), semantics (e.g., 
"Do not use 'deem' for 'consider'"), and documenf sfrucfure ("Use shorf secfions or 
subsecfions"). In addifion, fhere are close fo 90 words and phrases fhat musf nof be 
used, with suggested replacements for each of fhem (e.g., hide insfead of conceal, and 
rest insfead of remainder). — P^E^N^S^, C w D G 

"MILE Query Language" dPiwek ef al. 2000l is a language fo access marifime rules and 
regulafions. If follows fhe concepfual aufhoring approach in a very similar way as fhe 
Draffer and CLEF languages. — P^E^N^S^, F w D A 

Multinational Customized English (MCE) (IRuffmo 1982) is a controlled language de¬ 
veloped at Xerox to improve the quality of machrne-assisfed franslafion. If was based 
on ILSAM (Adriaens and Schreors 1992) . If uses a resfricfed domain-specific vocabulary 
and "a sef of wrifing rules which encourage a clear, concise English and a minimizafion 
of ambiguifies." — P^E^N^S^, T w D I 

Nortel Standard English (NSE) (Smart 2006) is a language developed at Nortel, a 
telecommunications equipment manufacturer. The development started in 1995 with 
the help of SMART Communicafions, and fhe language was probably influenced by 
SMART Plain English. — P^E^N^S^, C w D I 

Naproche CNL (Cramer ef al. 2010t is a confrolled language for mafhemafical fexfs 
similar fo CLM and ForTheL. Texfs in Naproche CNL can be deferminisfically mapped 
fo firsf-order logic and fhen automatically checked for logical correcfness. The following 
is an excerpf of a proof wriffen in fhis language: 

Axiom 3: For every x, x' ^ 1. 

Axiom 4: If x' = y', then x = y. 

Theorem l:Ux ^ y then x' ^ y'. 

Proof: Assume that x ^ y and x' = y'. Then by axiom 4, a; = t/. Qed. 

According to its authors, most texts of mafhemafical fexfbooks "can be rewritten in fhe 
Naproche CNL in such a way fhaf fhey resemble fhe original fexf." — P^E^N^S^, F w D A 
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NCR Fundamental English dNCR 19781 is a CNL developed at NCR Corporation. The 
language was used for the technical manuals of fhe company in order fo make fhem 
"easier fo read and use by NCR employees and cusfomers around fhe world." These 
are two examples of senfences: 

While repairing the unit, the field engineer also performs normal maintenance if it is 

needed. 

No maintenance can be performed until the maintenance lock has been activated. 

The language consists of three parts: nomenclature, glossary, and vocabulary. Every 
word of the language belongs to exactly one of these categories. The nomenclature is 
an open set of different kinds of named individual entities, such as names of products, 
tools, routines, as well as named modes and conditions. The glossary is another open 
set of words for technical concepts, such as audit trail, that carmot be replaced by a 
phrase or brief clause using the words of the vocabulary. The vocabulary, finally, is the 
most interesting part. It consists of a fixed set of 1,350 words (verbs, nouns, adverbs, 
adjectives, pronouns, prepositions, articles, and conjunctions) plus 650 abbreviations. 
The content of the vocabulary ranges from fundamental words such as a, not, and in 
to domain-specific terms such as testware, calibrate, and taxable. The meaning of these 
words is restricted, and each comes with a definition in full English. The noun medium, 
for instance, is defined as "a method of payment" and must not be used in any other 
sense. The grammar is not explicitly restricted. — P^E^N^S^, C w D I 

Oce Controlled English dCucchiarini 2002t is a controlled language developed at Oce, 
a Dutch company in the printing and copying business. Oce Controlled English is 
combined with traditional machine translation techniques to improve the translation 
quality of the company's documentation in 17 different languages. One of the important 
properties of the language is that it leads to more concise texts. Eor example, instead of 
"In several windows, an icon shows the current status/activity of a printer. See the list 
below for a description of each status.", one would write: 

These icons show the status or activity of the copier. 

The language is implemented with the help of the MAXit Checker by SMART Commu¬ 
nications. — P^E^N^S^, T w D I 


OWL ACE (jKaljurand and Euchs 2006 is a controlled language for the ontology lan¬ 
guage OWL. S 5 mtactically, it is a subset of ACE. Semantically, it is tailored towards 
the expressiveness of OWL and is more specific than ACE with its underspecified 
semantics, particularly in the case of plurals. Thus, OWL ACE is more precise but less 
expressive than ACE. — P^E^N^S^, F w A 


"OWLPath's Guided English" dValencia-Garda et al. 20111 is a query language for a 
tool called OWLPath, with which ontologies can be queried. Statements in this language 
start with the phrase View any. These are two examples: 

View any COMMODITY has_quoted_price in BMF. 

View any COMPANY whose STOCK_PRICE.lastTrade is_greater_than $30 and 
is_included_in Dowjones in 2009-04-24. 


These statements are translated into the SPARQL query language. Even though their 
structure roughly follows English grammar, they carmot be considered valid English 
sentences. — P^E^N^S'^, F w A 


OWL Simplified English dPower 20121 is a controlled language for the Semantic Web. 
In contrast to most other approaches, there is no real lexicon, neither built-in nor user- 
defined. Only a very small number of function words are predefined, and users have 
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to list the verbs they intend to use. All other word categories are inferred based on 
S 5 mtactic clues such as capitalization and adjacent words. This is an example (assuming 
that governed and lives are listed as verbs): 

London is capital of a country that is governed by a man that lives in Downing Street. 

— p5E2N^S^ F W A 


"PathOnt CNL" dKim et al. 20051 Namgoong and Kim 20071 is a controlled language 
developed for a tool called PathOnt. The tool is multilingual, supporting English and 
Korean. Statements in this language are deterministically mapped to RDE triples. These 
are two exemplary sentences: 

Nam is a student supervised by a professor named Kim. 

A received specimen fixed in formalin is a soft tissue mass. 

The language seems to cover only simple existential statements. — P^E^N^S'^, F w A 


PENG dSchwitter 2002t is a controlled language whose name stands for "Processable 
English." It is a rich but unambiguous language that can be automatically translated 
via discourse representation structures into first-order logic with equality. It is inspired 
by ACE, and the approach has a strong focus on predictive editing. These are two 
examples: 

Every animal A eats all plants or eats all animals B that are smaller than A and that eat 
some plants. 

While the fox sleeps, the cat chases a bird. 

— P^E^N^S^, F W A 


PENG-D dSchwitter and Tilbrook 2004| is a language derived from PENG, the main 
difference being that PENG-D builds upon RDE and OWL instead of discourse rep¬ 
resentation structures. — P^E^N^S^, F w A 


PENG Light dSchwitter 20081 is another language derived from PENG. It maps to the 
TPTP notation for first-order logic. — P^E^N^S^, F w A 


Perkins Approved Clear English (PACE) ( |Pym 19% i is a controlled language devel¬ 
oped at Perkins, a diesel engine manufacturer and now a subsidiary of Caterpillar. The 
language was introduced in 1980 and was based on ILSAM. The goal was to improve 
machine-assisted translation. In order to avoid the use of S 5 monyms, PACE comes with 
a dictionary which has been gradually extended and counted 2,500 entries in 1990, such 
as "passage (n): A drilling along which a fluid moves." PACE is summarized in "Ten 
Rules of Simplified Writing": 


1. keep sentences short 

2. omit redundant words 

3. order the parts of the sentence logically 

4. do not change constructions in mid 
sentence 

5. take care with the logic of 'and' and 'or' 


6. avoid elliptical constructions 

7. do not omit conjunctions or relatives 

8. adhere to the PACE dictionary 

9. avoid strings of nouns 

10. do not use 'ing' unless the word 
appears thus in the PACE dictionary 


The aim of the first five rules is to make the text short and simple, while the last five 
rules have the somewhat opposing objective to make the text more explicit. This is an 
example consisting of two PACE sentences: 

Loosen the pivot fasteners of the dynamo or of the alternator. Loosen also the fasteners 
of the adjustment link. 
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— C T W D I 


PERMIS Controlled Natural Language ( [Inglesant et al. 2008) is a language for express¬ 
ing access control policies for grid computing environmenfs. If is based on CLOnE wifh 
specific exfensions for aufhorizafion policies: 

Staff can print on HP LaserJet 1. 

I trust David to say who managers are. 

Such statements are mapped to different formal target notations. Each statement follows 
one of only nine statement patterns. — P^E^N^S^, F w D A 


"PILLS Language" ( Bouayad-Agha, Power, and Belz 2002[ | is a language for medical 
information documents used in a system called PILLS. It follows a similar editing ap¬ 
proach as the ICONOCLAST language, which was developed a couple of years earlier 
by the same research group. With the PILLS approach, different t 5 rpes of documents 
can be automatically generated from a master document and translated into different 
languages. — P^E^N^S^, T F D A 


Plain Language or Plain English (ISEC 19981IPLAIN 201111 is an initiative by the US 
government and other organizations. It had its origins in the 1970s with the goal to 
make official documents easier to understand and less bureaucratic. "Use pronouns to 
speak directly to readers" and "Avoid double negatives and exceptions to exceptions" 
are two exemplary rules. Unlike other such style guides, many of the guideline rules are 
strict and, with the Plain Writing Act of 2010, US governmental agencies are obliged to 
comply with them. With the focus being on human understandability and acceptance, 
documents in Plain Language do not seem to be considerably more precise or simpler 
from a computational point of view, when compared to full English. — P^E^N^S^, C w G 


PoliceSpeak ( [Johnson 2000) is a language developed to improve police communications 
of English and Erench officers at the Channel Tunnel. The goal was to "make police 
communications more concise, more predictable, more stable and less ambiguous." The 
project was launched in 1988 and the language was ready in 1992. It has a similar goal 
and application area as SEASPEAK and the different air traffic control phraseologies. — 
P^E^N^S^, c S D G 


"PROSPER Controlled English" dCrover et al. 20d0t is a language for the specification 
and verification of hardware designs, developed in the late 1990s. The language is based 
on a restricted version of a general English grammar. Sentences of the language can be 
automatically mapped to a certain t 5 rpe of temporal logic. This is an exemplary sentence: 

If sigi is high and then is low on the next cycle, then sigo is low and after one cycle 
becomes high and then after one more cycle becomes low. 

Ambiguity is not completely eliminated, but ambiguous sentences can be automatically 
spotted and reported to the user. — P'^E^N'^S^, F w D A 

Pseudo Natural Language (PNL) (IMarchiori 20041 is a language designed as a user- 
friendly language for the Semantic Web. It builds upon RDE and first-order logic, and 
uses Prolog to calculate inferences. These are two exemplary sentences: 

JOHN represents the person "John Smith" from the company "http:// 
www.example.com / staff". 

if IMPLY has as ARGUMENTS X and Y in this order, then X LOGICAL-IMPLY Y. 

Upper-case words such as JOHN act as variables that can be instantiated with concrete 
definitions involving URIs. PNL is unambiguous and has well-defined semantics, but 
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unnatural capitalization mitigates the naturalness of the language. Its structure looks 
simple at first sight, but rather complex rules have to be applied in order to resolve 
ambiguous S 5 mtax trees. — P^E^N^S^, F w A 

"Quelo Controlled English" dFranconi et al. 20111 is a language introduced in 2010 and 
used in a query interface called Quelo. This is an exemplary query: 

I am looking for something. It should be equipped with an automatic transmission 
system and sold by a car dealer. The car dealer should sell a fleet car. 

Following a conceptual authoring approach, users cannot directly edit the sentences, 
but they can trigger modification actions on the underlying formal representation. — 
P^E^N^S^, F w A 

Rabbit ( |Hart, Johnson, and Dolbear 2008) is a controlled language for OWL. It has been 
developed and used by Ordnance Survey, Great Britain's national mapping agency. Rab¬ 
bit is designed for a specific scenario, in which it is used for the communication between 
domain experts and ontology engineers to create ontologies. Three types of statements 
are supported: declarations, axioms, and import statements. These are examples of the 
first and second type: 

Sheep is a concept, plural Sheep. 

Every River flows into exactly one of River, Lake or Sea. 

The language is quite simple, being defined by a small number of sentence patterns and 
some modifications thereof. — P^E^N^S^, F w G 

Restricted English for Constructing Ontologies (RECON) 

( [Barkmeyer and Mattas 2012) is a language to represent facts and rules in an industrial 
environment, where these facts and rules have a deterministic mapping to first-order 
logic. This is an exemplary sentence: 

If any container contains part of a shipment, it contains no other shipment. 

The language is defined by around 200 rules in Backus-Naur form. — P^E^N^S^, F w A G 

Restricted Natural Language Statements (RNLS) dBreaux and Anton 20051 

is a language for policy statements and software 
in 2004. The following are two exemplary RNLS 

RNLS #1: The customer will select access codes. 

RNLS #2: The provider will recommend (RNLS #1) to the customer. 

The second sentence refers to the first one using its identifier RNLS #1. There is a 
mapping between RNLS and Description Logic, but it is not clear whether this mapping 
is automated. — P^E^N^S^, F w D A 

RuleSpeak dRoss 20031 lOMG 20081 IRoss 201311 is a CNL for business rules. The de¬ 
velopment of the language started in 1985 and it was first presented in 1994. It is 
very similar to SBVR Structured English, which emerged later. Each RuleSpeak rule 
belongs to one of eleven "functional categories" such as "computation rule," "inference 
rule," and "process trigger." For each of these categories specific templates are defined. 
Computation rules, for example, contain the phrase "must be computed as" (or simply 
"="). The first of the following two examples is such a computation rule: 

A product's cost must be computed as the sum of the cost of all its components. 

An order may be accepted only if all of the following are true: 

- It includes at least one item. 

- It indicates the customer who is placing it. 


Breaux, Anton, and Doyle 2008 


engineering goals introduced 
statements: 
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Sometimes the color codes of SBVR Structured English are adopted to emphasize the 
different types of fhe senfence consfifuenfs. Like SBVR Sfrucfured English, RuleSpeak 
is linked fo fhe SBVR sfandard, which provides formal semanfics based on second-order 
logic wifh Henkin semanfics. However, fhe mapping from RuleSpeak fexfs fo fhe logical 
represenfafion is only defined in an informal way. The sfricf femplafes considerably 
simplify fhe language, buf fhere is no formal grammar fhaf would fully define fhe 
language. — P^E^N^S^, C F w I 

SBVR Structured English See Section Wl\ — P^E^N^S^, C F w I 


SEASPEAK ( Strevens and Johnson 1983l l is an "International Maritime English" de¬ 
signed for clear communicafion among ships and harbors. Ifs developmenf sfarfed in 
1981. If is a confrolled phraseology similar fo PoliceSpeak and fhe differenf air fraffic 
confrol phraseologies. — P^E^N^S^, C S D G 


SMART Controlled English jSmart 20061 is a "more advanced version" of ASD Sim¬ 
plified Technical English, developed by fhe company SMART Communications. It was 
probably influenced by SMART Plain English, and has been applied fo differenf areas. 
This is an excerpt of a documenf in SMART Confrolled English: 

When the Quaternary Pump starts operation, the plunger moves inside the chamber. 

This movement lets the computer calculate and store a position called "Top Dead 
Center" (TDC). 

The language is implemented in a tool called MAXit Checker, which is able to spot 
violations of the restrictions of the language. — P^E®N®S^, C T w I 


SMART Plain English, sometimes called Plain English Program (PEP), is a controlled 
language developed and used at SMART Communications since the mid 1980sll It 
is based on CPE and was the basis for HELP and the controlled languages at Clark 
and Rockwell (Adriaens and Schreors 1992) . As for SMART Controlled English, the tool 
MAXit Checker can be used to create compliant documents. — P^E^N^S^, C w I 

"Sowa's syllogisms. " See Section^ — p5ElN^S^ F w A 

Special English (Voice of America 2009) is a simplified English developed and used 
by the Voice of America (VOA), the official external broadcast institution of the US 
government. The language has been used since 1959 and is still used today for news 
on radio, television, and the web. This makes it the second oldest English-based CNL 
(after Basic English) and the only one that has been in use for such a long period by the 
same organization. At the time of its creation. Special English was probably influenced 
by Basic English. The vocabulary is restricted to about 1,500 words, which have changed 
over time. Sentences should be short and should be spoken at a slower speed. There are 
no explicit restrictions on grammar or semantics. — P^E^N^S^, C w S G 

SQUALL (Perre 2012) is a controlled natural language in the area of the Semantic Web to 
query and update RDP graphs. Sentences in this language are translated into the query 
language SPARQL, whereby structural ambiguity is resolved based on a few S 5 mtactic 
rules. This is an example: 

for every publication ?X, ?X has an author ?A and ?A cite-s ?X 
The language is defined by about 50 simple grammar rules. — P^E^N^S^, F w A 

Standard Language (SLANG). See Section Wl\ — P^E^N^S^, C F w D I 


6 http://www.smartny.com/plainEnglish.htm 
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Sun Proof dWells Akis and Sisson 20021 is a controlled language introduced at Sun for 
their technical documentation. The initial development of the language lasted from 1999 
until 2002. The general objective was to write texts that are "easier to understand and 
to translate for humans as well as machines" but with a clear focus on translatability. 
Sun Proof is restricted by three sets of guidelines: style guidelines, grammar rules, and 
terminology. One of the most important rules is the limitation of the sentence length to 
25 words. Other rules include semantic restrictions such as using may only for granting 
permission. This is an exemplary sentence: 

This chapter provides an overview of the standardized solutions that are required to 
make the transition from IPv4 to IPv 6 . 

— C T W D I 


Sydney OWL Syntax (SOS) < Cregan, Schwitter, and Meyer 200^ is a controlled lan¬ 
guage introduced in the context of the Semantic Web. It is based on PENG and provides 
a bidirectional and complete mapping to the ontology language OWL. These are two 
exemplary sentences: 

The class adult is fully defined as any person that has at least 20 as an age. 

If X has Y as a father then Y is the only father of X. 

— P^E^N^S^, F W A 


Template Based Natural Language Specification (TBNLS) dEsser and Struss 2007b is 
a CNL approach for functional tests of control software for passenger vehicles. The 
language is defined by 15 templates that provide a mapping to propositional logic with 
temporal relations. This is an exemplary sentence: 


If 


Button B 4 is down Pi occurs, then 


Lamp L 3 is red P 2 hold immediately, until 


10 seconds Ti 


elapsed. 


Pi and P 2 represent the propositional variables for the respective boxes, and Ti is a time 
variable. — P^E^N^S^, F w D A I 


ucsCNL dBarros et al. 2011) is a controlled natural language for use case specifications 
in the area of automated software testing. The language is intended to be unambiguous 
and is defined by a small number of simple grammar rules. There are imperative 
sentences to describe user actions, as well as declarative statements to describe the 
system state before and after user actions: 

After creating a message with 100 characters, go to the drafts folder 
The imported media file is a music file 

— P5E2N^S^ F W D A 

Voice Action^ are a CNL for spoken action commands on the Android mobile phone 
platform. Currently, the language covers twelve informally defined command patterns 
such as "map of," "note to self," and "create a calendar event." The following is an 
example: 

Create a calendar event: Dinner in San Francisco, Saturday at 7:00PM 

These spoken commands can be automatically interpreted and executed by the system. 

— P^E^N^S^ F S D I 


7 http://support.google.com/android/bin/answer.py?hl=en&answer=1715292 
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